Keywords: Retrieval-Augmented Generation, Consistency, Robustness, RL, LLMs
TL;DR: We propose Con-RAG, a reinforcement learning framework that improves information consistency in RAG systems by aligning outputs across paraphrased queries, even under retriever variability, without requiring ground-truth supervision.
Abstract: Retrieval-augmented generation (RAG) systems are increasingly deployed in high-stakes domains where users expect outputs to be consistent across semantically equivalent queries. However, existing RAG systems often exhibit significant inconsistencies due to variability in both the retriever and the generator (LLM), undermining trust and reliability. In this work, we focus on \emph{information consistency}—the requirement that outputs convey the same core content and information across semantically equivalent inputs. We introduce a principled evaluation framework that decomposes RAG consistency into retriever-level, generator-level, and end-to-end components, enabling systematic diagnosis of inconsistency sources.
To improve consistency, we propose Information \textbf{Con}sistent \textbf{RAG} (Con-RAG), an RL approach that leverages Group Relative Policy Optimization (GRPO) multiple rollouts per query to compute paraphrase \emph{group similarity rewards}, training the generator to produce consistent outputs across paraphrased queries and remain robust to retrieval-induced variability. We also introduce a scalable approximation to reduce the cost of reward computation, making Con-RAG practical for large-scale training. Empirical evaluations across five QA benchmarks including short-form, multi-hop, and long-form tasks, demonstrate that Con-RAG significantly improves both consistency and accuracy over strong baselines, even in the absence of explicit ground truth supervision. Our work provides practical solutions for evaluating and building reliable RAG systems for safety-critical deployments.
Submission Number: 96
Loading