Rethinking LLM Parametric Knowledge as Confidence for Effective and Efficient Retrieval-Augmented Generation

Rethinking LLM Parametric Knowledge as Confidence for Effective and Efficient Retrieval-Augmented Generation

ACL ARR 2026 January Submission10396 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Knowledge Boundary, Evaluation, Large Language Models, Retrieval-Augmented Generation, Reranker, Generator

Abstract: Retrieval-Augmented Generation (RAG) alleviates hallucinations in Large Language Models (LLMs) by leveraging external knowledge, but key challenges persist in retrieving high-utility context and determining whether to trigger retrieval when addressing domain-specific questions. Current methods overlook the rich information embedded in LLMs’ continuous internal hidden states, yet changes in these states triggered by different retrieved documents inherently serve as natural preference signals. To address this, we propose a method that guides retrieval (and reranking) based on changes in the target LLM’s internal confidence: First, we construct a confidence detection model using the LLM’s internal hidden states to quantify how retrieved contexts enhance the model’s confidence. Second, we utilize this model to build a preference dataset for fine-tuning a reranker, enabling it to prioritize contexts favored by the downstream LLM. Additionally, we introduce the CBDR mechanism, which adaptively triggers retrieval based on the LLM’s initial confidence in the original question to reduce knowledge conflicts and improve efficiency. Experimental results demonstrate significant improvements in both context screening accuracy and end-to-end RAG performance: When dynamic retrieval is activated, the system’s accuracy increases by 5.6 percentage points (pp), while retrieval cost decreases by 7.1 pp. This substantially enhances the system’s practical utility while maintaining competitive accuracy.

Paper Type: Long

Research Area: Retrieval-Augmented Language Models

Research Area Keywords: retrieval-augmented generation, re-ranking, fine-tuning

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 10396

Loading