Keywords: Large Language Models, Frugal AI, Uncertainty Quantification
TL;DR: In this work, we propose a method to detect knowledge boundaries via Query-Level Uncertainty, which estimates if a model is capable of to answering a given query without generating any tokens.
Abstract: It is important for Large Language Models (LLMs) to be aware of the boundary of their knowledge, i.e., the mechanism of identifying known and unknown queries. This type of awareness enables models to perform adaptive inference, such as invoking retrieval-augmented generation (RAG), engaging in slow and deep thinking, or abstaining from answering when appropriate. These mechanisms are beneficial to the development of efficient and trustworthy AI.
In this work, we propose a method to detect knowledge boundaries via \textbf{\emph{Query-Level Uncertainty }}, which estimates if a model is capable of to answering a given query before generating any tokens.
To this end, we propose a novel, training-free method called \textbf{\emph{Internal Confidence}}, which leverages self-evaluations across layers and tokens to provide a reliable signal of uncertainty.
Empirical studies on both factual question answering and mathematical reasoning tasks demonstrate that our internal confidence can outperform several baselines. Furthermore, we showcase that our proposed method can be used for adaptive inference, such as efficient RAG and model cascading, thereby reducing inference costs while preserving overall performance.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 13943
Loading