DENSE-RAG: Measuring and Improving Context Understanding for Consistent Retrieval-Augmented Generation

ICLR 2026 Conference Submission12964 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Retrieval-augmented-generation;Large Language Model;Open-book QA
TL;DR: We propose an unsupervised method DENSE to evalaute LLM's understanding of context in RAG, and DENSE-RAG to improve RAG performance on Open book QA task according to the DENSE evaluation.
Abstract: Retrieval-Augmented Generation (RAG) has significantly advanced LLM performance in knowledge-intensive tasks. However, when LLMs misinterpret retrieved content, they often revert to pre-trained parametric knowledge or generate hallucinated responses, undermining RAG effectiveness. This highlights the need to assess LLMs’ understanding of the retrieved context, since effective measurement is essential for evaluating and improving RAG performance. In this work we try to explore this problem by proposing DEgree-based uNcertainty with Semantically Equivalent contexts (DENSE), a training-free and model-agnostic method to evaluate LLM understanding of retrieved documents. DENSE constructs semantically equivalent context and introduces a degree-based entropy to quantify response semantic uncertainty. Building on DENSE, we further introduce DENSE-RAG, which includes two training-free DENSE-guided modules: adaptive semantic chunking and iterative context refinement. Extensive experiments on open-book QA datasets show that higher DENSE uncertainty correlates with lower QA performance, validating DENSE as a reliable indicator of LLM understanding measurement. DENSE-RAG also achieves performance competitive with state-of-the-art baselines approaches without introducing additional model or fine-tuning.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 12964
Loading