Abstract: Understanding sources of a model's uncertainty regarding its predictions is crucial for effective human-AI collaboration. Prior work proposes to use numerical uncertainty or hedges (``I'm not sure, but...''), which do not explain uncertainty arising from conflicting evidence, leaving users unable to resolve disagreements or rely on the output. We introduce CLUE (**C**onflict-\&Agreement-aware **L**anguage-model **U**ncertainty **E**xplanations), the first framework to generate natural language explanations of model uncertainty by: (i) identifying relationships between spans of text that expose claim-evidence or inter-evidence conflicts/agreements driving the model's predictive uncertainty in an unsupervised way; and (ii) generating explanations via prompting and attention steering to verbalize these critical interactions. Across three language models and two fact-checking datasets, we demonstrate that CLUE generates explanations that are more faithful to model uncertainty and more consistent with fact-checking decisions than prompting for explanation of uncertainty without span-interaction guidance. Human evaluators find our explanations more helpful, more informative, less redundant, and better logically aligned with the input than this prompting baseline. CLUE requires no fine-tuning or architectural changes, making it plug-and-play for any white-box language model. By explicitly linking uncertainty to evidence conflicts, it offers practical support for fact-checking and readily generalizes to other tasks that require reasoning over complex information.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: natural language explanations,uncertainty,explanation faithfulness
Languages Studied: English
Submission Number: 3895
Loading