Right Routing, Right Answering: Joint Path-Answer Preference Optimization for Retrieval-Augmented Generation

ACL ARR 2026 January Submission10184 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Retrieval-Augmented Generation, Large Language Model, Preference Optimization
Abstract: Retrieval-Augmented Generation (RAG) often suffers from noisy or irrelevant retrievals, which can substantially undermine the quality of generated answers. While recent methods like Retrieval Preference Optimization (RPO) empower LLMs to adaptively decide whether to use retrieved content, they primarily focus on improving this routing decision. A critical oversight is the lack of supervision over answer quality within a selected path. Consequently, even with correct routing, the correctness of the final answer is not adequately guaranteed. To address this, we propose Joint Path-Answer Preference Optimization (JPAPO), a novel framework that jointly optimizes both path routing and within-path answering. Our solution is simple yet effective, tackling this dual challenge through three strategically designed preference pairs, ensuring both easy integration and scalability. Extensive experiments across diverse benchmarks and LLM backbones demonstrate the framework's effectiveness, achieving improvements of up to 5.9% over RPO.
Paper Type: Long
Research Area: Retrieval-Augmented Language Models
Research Area Keywords: Language Modeling, Generation
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 10184
Loading