Uncertainty-Aware LLMs Fail to Flag Misleading Contexts

Published: 29 Sept 2025, Last Modified: 12 Oct 2025NeurIPS 2025 - Reliable ML WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM safety and alignment, Robustness
TL;DR: Accurate context improves both accuracy and confidence, while misleading context yields confidently wrong outputs, exposing a misalignment between uncertainty estimates and correctness.
Abstract: Large Language Models (LLMs) are prone to generating fluent but incorrect content, known as confabulation, which poses increasing risks in multi-turn or agentic applications where outputs may be reused as context. In this work, we investigate how in-context information influences model response behavior and whether LLMs can identify unreliable context. Specifically, we compute aleatoric and epistemic uncertainty from output logits to quantify response confidence. Through controlled experiments on open QA benchmarks, we find that correct in-context information improves both answer accuracy and model confidence, while misleading context often induces confidently incorrect responses, revealing a misalignment between uncertainty and correctness. These results underscore the limitations of direct uncertainty signals and highlight the risk of reliability-aware generation in interactive agentic environments.
Submission Number: 66
Loading