Information Flow Reveals When to Trust Language Models

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: uncertainty quantification, large language model, retrivel augmented generation
Abstract: Large language models (LLMs) have emerged as powerful tools for real-world applications, but their utility is often undermined by a fundamental flaw: a tendency toward overconfidence and guessing that leads to unreliable responses. This issue is particularly critical in retrieval-augmented generation (RAG), which is explicitly designed to provide factually grounded answers with retrieved context. Current approaches to quantifying LLM uncertainty are often inadequate, as they rely on surface signals from either the input embeddings or the output space, such as token probabilities or semantic consistency across multiple generations. This work opens the black box of transformers and assesses response reliability by analyzing the information flow within language models. Specifically, we uncover the contributions of context tokens to the generated output, providing an interpretable basis for evaluating reliability. From this analysis, we introduce two measures. The first, simulatability, assesses the alignment between the context token contributions and their relevance, and the second, concentration, quantifies the extent to which a response's support stems from a narrow subset of tokens. Our experiments demonstrate that these information-flow signals offer a more effective and interpretable basis for assessing reliability than existing methods, outperforming baselines across multiple metrics and advancing the development of more trustworthy LLM deployments. Meanwhile, we also discuss computational considerations and our method’s application scope.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 8447
Loading