Entropy Is Not Enough: Uncertainty Quantification for LLMs fails under Aleatoric Uncertainty

Tim Tomov; Dominik Fuchsgruber; Tom Wollschläger; Stephan Günnemann

Entropy Is Not Enough: Uncertainty Quantification for LLMs fails under Aleatoric Uncertainty

Tim Tomov, Dominik Fuchsgruber, Tom Wollschläger, Stephan Günnemann

Published: 23 Sept 2025, Last Modified: 23 Dec 2025SPIGM @ NeurIPSEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Uncertainty Quantification

TL;DR: Uncertainty Quantification for LLMs fails under Aleatoric Uncertainty

Abstract: Accurate uncertainty quantification (UQ) in Large Language Models (LLMs) is critical for trustworthy deployment. While real-world language is inherently ambiguous, existing UQ methods implicitly assume scenarios with no ambiguity. Therefore, a natural question is how they work under ambiguity. In this work, we demonstrate that current uncertainty estimators only perform well under the restrictive assumption of no aleatoric uncertainty and degrade significantly on ambiguous data. Specifically, we provide theoretical insights into this limitation and introduce two question-answering (QA) datasets with ground-truth answer probabilities. Using these datasets, we show that current uncertainty estimators perform close to random under real-world ambiguity. This highlights a fundamental limitation in existing practices and emphasizes the urgent need for new uncertainty quantification approaches that account for the ambiguity in language modeling

Submission Number: 130

Loading