Query-Level Uncertainty in Large Language Models

Query-Level Uncertainty in Large Language Models

ICLR 2026 Conference Submission13943 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Frugal AI, Uncertainty Quantification

TL;DR: In this work, we propose a method to detect knowledge boundaries via Query-Level Uncertainty, which estimates if a model is capable of to answering a given query without generating any tokens.

Abstract: It is important for Large Language Models (LLMs) to be aware of the boundary of their knowledge, i.e., the mechanism of identifying known and unknown queries. This type of awareness enables models to perform adaptive inference, such as invoking retrieval-augmented generation (RAG), engaging in slow and deep thinking, or abstaining from answering when appropriate. These mechanisms are beneficial to the development of efficient and trustworthy AI. In this work, we propose a method to detect knowledge boundaries via \textbf{\emph{Query-Level Uncertainty }}, which estimates if a model is capable of to answering a given query before generating any tokens. To this end, we propose a novel, training-free method called \textbf{\emph{Internal Confidence}}, which leverages self-evaluations across layers and tokens to provide a reliable signal of uncertainty. Empirical studies on both factual question answering and mathematical reasoning tasks demonstrate that our internal confidence can outperform several baselines. Furthermore, we showcase that our proposed method can be used for adaptive inference, such as efficient RAG and model cascading, thereby reducing inference costs while preserving overall performance.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 13943

Loading