CREAM-RAG: Enhanced Retrieval Augmented Generation to Limit Hallucination through Consistency-based Self-RAG
Keywords: Retrieval-Augmented Generation (RAG), Self-rewarding language models, Consistency regularization, Direct Preference Optimization (DPO), Reinforcement learning from AI feedback (RLAIF), Hallucination mitigation, Evidence grounding, Rank-aware retrieval, Actor-critic generation, Reward stability, Temporal consistency, Long-form question answering, Multi-hop reasoning
TL;DR: CREAM-RAG stabilizes self-reward signals in retrieval-augmented generation via consistency regularization, reducing hallucinations and boosting factual accuracy, outperforming Self-RAG on long-form QA benchmarks with LLaMA-2-7B.
Abstract: Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by grounding responses in external evidence, mitigating hallucinations, and enabling access to up-to-date, domain-specific knowledge. However, existing RAG frameworks often suffer from unstable self-supervised optimization signals and inconsistent factual grounding. We introduce CREAM-RAG (Consistency-Regularized Enhanced Augmented Model for RAG) (Wang et al. [2025c]), a unified framework that integrates retrieval, Direct Preference Optimization (DPO)-based self-reward reinforcement learning, and a consistency regularization objective to stabilize reward dynamics during fine-tuning. By enforcing alignment between multiple retrieved contexts and generated responses, CREAM-RAG improves factual faithfulness and semantic coherence without external supervision. Empirical evaluations on the LLaMA-2-7B model demonstrate that CREAM-RAG achieves a 35.04% average improvement over the base model across reasoning and factuality benchmarks, highlighting its effectiveness in reducing hallucinations and enhancing retrieval-grounded reasoning.
Submission Number: 28
Loading