Track: Long Paper Track (up to 9 pages)
Keywords: uncertainty, hallucination, trust, large language model
Abstract: Language models (LMs) often hallucinate. While uncertainty measures like calibration scores provide coarse measures of model uncertainty (e.g. "This proof is 40% likely to be correct"), ideally a model could tell us what it's uncertain about, such as "I don't know how to find the length of side AB," enabling people to understand exactly where to trust a model response.
We propose diagnostic uncertainty: open-ended descriptions of uncertainty that are grounded in model behavior. Our key idea is that a model can be said to be uncertain about X (e.g., "how to find the length of side AB") if its responses significantly improve after being told
X, and X is earliest in its reasoning process.
We implement a method to bootstrap models' ability to generate these diagnostic uncertainty descriptions by iteratively training on sampled descriptions that satisfy these criteria.
To evaluate whether diagnostic descriptions are meaningful, we provide the model with the information it claims to be uncertain about and measure whether its performance improves.
Compared to the descriptions generated by prompting alone, resolving diagnostic uncertainty descriptions leads to 8% higher accuracy and 20% more reduction in entropy of the answer distribution, supporting the hypothesis that diagnostic uncertainty is more faithful to the model's underlying uncertainty.
The main contribution of our work is a framework for operationalizing open-ended uncertainty in LMs, enabling richer ways for people to understand LM behavior beyond raw probabilities.
Submission Number: 115
Loading