Keywords: LLM, Uncertainty Quantification
TL;DR: Uncertainty Quantification for LLMs fails under Aleatoric Uncertainty
Abstract: Accurate uncertainty quantification (UQ) in Large Language Models (LLMs)
is critical for trustworthy deployment. While real-world language is inherently
ambiguous, existing UQ methods implicitly assume scenarios with no ambiguity.
Therefore, a natural question is how they work under ambiguity. In this work,
we demonstrate that current uncertainty estimators only perform well under the
restrictive assumption of no aleatoric uncertainty and degrade significantly on
ambiguous data. Specifically, we provide theoretical insights into this limitation
and introduce two question-answering (QA) datasets with ground-truth answer
probabilities. Using these datasets, we show that current uncertainty estimators
perform close to random under real-world ambiguity. This highlights a fundamental
limitation in existing practices and emphasizes the urgent need for new uncertainty
quantification approaches that account for the ambiguity in language modeling
Submission Number: 130
Loading