Rethinking LLM Parametric Knowledge as Confidence for Effective and Efficient Retrieval-Augmented Generation

ACL ARR 2026 January Submission10396 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Knowledge Boundary, Evaluation, Large Language Models, Retrieval-Augmented Generation, Reranker, Generator
Abstract: Retrieval-Augmented Generation (RAG) alleviates hallucinations in Large Language Models (LLMs) by leveraging external knowledge, but key challenges persist in retrieving high-utility context and determining whether to trigger retrieval when addressing domain-specific questions. Current methods overlook the rich information embedded in LLMs’ continuous internal hidden states, yet changes in these states triggered by different retrieved documents inherently serve as natural preference signals. To address this, we propose a method that guides retrieval (and reranking) based on changes in the target LLM’s internal confidence: First, we construct a confidence detection model using the LLM’s internal hidden states to quantify how retrieved contexts enhance the model’s confidence. Second, we utilize this model to build a preference dataset for fine-tuning a reranker, enabling it to prioritize contexts favored by the downstream LLM. Additionally, we introduce the CBDR mechanism, which adaptively triggers retrieval based on the LLM’s initial confidence in the original question to reduce knowledge conflicts and improve efficiency. Experimental results demonstrate significant improvements in both context screening accuracy and end-to-end RAG performance: When dynamic retrieval is activated, the system’s accuracy increases by 5.6 percentage points (pp), while retrieval cost decreases by 7.1 pp. This substantially enhances the system’s practical utility while maintaining competitive accuracy.
Paper Type: Long
Research Area: Retrieval-Augmented Language Models
Research Area Keywords: retrieval-augmented generation, re-ranking, fine-tuning
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 10396
Loading