Right Routing, Right Answering: Joint Path-Answer Preference Optimization for  Retrieval-Augmented Generation

Right Routing, Right Answering: Joint Path-Answer Preference Optimization for Retrieval-Augmented Generation

ACL ARR 2026 January Submission10184 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation, Large Language Model, Preference Optimization

Abstract: Retrieval-Augmented Generation (RAG) often suffers from noisy or irrelevant retrievals, which can substantially undermine the quality of generated answers. While recent methods like Retrieval Preference Optimization (RPO) empower LLMs to adaptively decide whether to use retrieved content, they primarily focus on improving this routing decision. A critical oversight is the lack of supervision over answer quality within a selected path. Consequently, even with correct routing, the correctness of the final answer is not adequately guaranteed. To address this, we propose Joint Path-Answer Preference Optimization (JPAPO), a novel framework that jointly optimizes both path routing and within-path answering. Our solution is simple yet effective, tackling this dual challenge through three strategically designed preference pairs, ensuring both easy integration and scalability. Extensive experiments across diverse benchmarks and LLM backbones demonstrate the framework's effectiveness, achieving improvements of up to 5.9% over RPO.

Paper Type: Long

Research Area: Retrieval-Augmented Language Models

Research Area Keywords: Language Modeling, Generation

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 10184

Loading