Retrieval is Not Enough: Enhancing RAG through Test-Time Critique and Optimization

Jiaqi Wei; Hao Zhou; Xiang Zhang; Di Zhang; Zijie Qiu; Noah Wei; Jinzhe Li; Wanli Ouyang; Siqi Sun

Retrieval is Not Enough: Enhancing RAG through Test-Time Critique and Optimization

Jiaqi Wei, Hao Zhou, Xiang Zhang, Di Zhang, Zijie Qiu, Noah Wei, Jinzhe Li, Wanli Ouyang, Siqi Sun

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation (RAG), Test-time Scaling

Abstract: Retrieval-augmented generation (RAG) has become a widely adopted paradigm for enabling knowledge-grounded large language models (LLMs). However, standard RAG pipelines often fail to ensure that model reasoning remains consistent with the evidence retrieved, leading to factual inconsistencies or unsupported conclusions. In this work, we reinterpret RAG as \textit{Retrieval-Augmented Reasoning} and identify a central but underexplored problem: \textit{Reasoning Misalignment}—the divergence between an LLM's internal reasoning trajectory and the evidential constraints provided by retrieval. To address this issue, we propose \textsc{AlignRAG}, a novel iterative framework grounded in \textit{Critique-Driven Alignment (CDA)}. We further introduce \textsc{AlignRAG-auto}, an autonomous variant that dynamically terminates refinement, removing the need to pre-specify the number of critique iterations. At the heart of \textsc{AlignRAG} lies a \textit{contrastive critique synthesis} mechanism that generates retrieval-sensitive critiques while mitigating self-bias. This mechanism trains a dedicated retrieval-augmented \textit{Critic Language Model (CLM)} using labeled critiques that distinguish between evidence-aligned and misaligned reasoning. Empirical evaluations show that our approach significantly improves reasoning fidelity. Our 8B-parameter CLM improves performance over the Self-Refine baseline by \textbf{12.1\%} on out-of-domain tasks and outperforms a standard 72B-parameter CLM by \textbf{2.2\%}. Furthermore, \textsc{AlignRAG-auto} achieves this state-of-the-art performance while dynamically determining the optimal number of refinement steps, enhancing efficiency and usability. \textsc{AlignRAG} remains compatible with existing RAG architectures as a \textit{plug-and-play} module and demonstrates strong robustness under both informative and noisy retrieval scenarios. Overall, \textsc{AlignRAG} offers a principled solution for aligning model reasoning with retrieved evidence, substantially improving the factual reliability and robustness of RAG systems. Our source code is provided at \href{https://github.com/upup-wei/RAG-ReasonAlignment}{link}.

Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)

Submission Number: 4764

Loading