Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception

Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception

ACL ARR 2025 February Submission7066 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The internal states of large language models (LLMs) have shown to be effective in indicating the factuality of their responses. Most existing studies focus on post-generation internal states, which introduces extra overhead. It remains unclear whether extracting these states after generation is necessary. Moreover, controlling the risk associated with model outputs is essential, especially in safety-critical domains, which requires accurately detecting what LLMs do not know. To address these concerns, this paper exploits LLMs' internal states to enhance knowledge perception from two perspectives: efficiency improvement and risk mitigation. Specifically, we 1) Investigate LLMs’ ability to assess the factuality of their responses using internal states both before and after generation. Our experiments on three factual QA benchmarks demonstrate that LLMs can perceive the correctness of their answers before generation, with this perception further enhanced after generation. 2) Introduce $C^3$ (Consistency-based Confidence Calibration), a technique that calibrates model confidence by evaluating the consistency of its confidence across different reformulations of the same question. We show that $C^3$ considerably improves LLMs’ ability to recognize its unknowns. We recommend using pre-generation predictions for high-efficiency scenarios and applying $C^3$ in safety-critical applications.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: LLMs' Knowledge Boundary; LLMs' Confidence Estimation

Contribution Types: Data analysis

Languages Studied: English

Submission Number: 7066

Loading