Abstract: Retrieval-Augmented Generation (RAG) significantly enhances Large Language Models (LLMs) in knowledge-intensive tasks. However, traditional static retrieval strategies fail to adapt to the evolving informational needs during generation, often resulting in insufficient or redundant content. Existing adaptive retrieval methods commonly rely on output probabilities or external heuristics, which fail to accurately reflect the model’s true knowledge needs. To address this, we introduce Semantic Entropy-based Adaptive RAG (SEARAG), training a discriminative model to predict binary semantic entropy from intermediate hidden-layer states, quantifying generation uncertainty in real-time.
During generation, we perform iterative sentence-by-sentence reasoning. If high semantic entropy is detected in an iteration, external knowledge retrieval is triggered for enhanced generation; otherwise, the process proceeds to the next iteration. This mechanism accurately identifies the model’s knowledge needs, reduces redundant retrieval, and improves output quality.
Experimental results on five multi-hop QA tasks show SEARAG outperforms existing adaptive RAG methods in performance and efficiency, confirming its effectiveness and generalization. We release our code in our Github repository.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: retrieval augmented generation, language models, Semantic Entropy
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 7939
Loading