Unifying Latent Uncertainty Signals in Large Language Models for Improved Factual Precision

Unifying Latent Uncertainty Signals in Large Language Models for Improved Factual Precision

ICLR 2026 Conference Submission14549 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Alignment, LLMs, Uncertainty, Hallucinations, Factuality, Safety

Abstract: Large Language Models (LLMs) have emerged as powerful tools for knowledge-intensive tasks, yet their tendency to generate factually incorrect or misleading outputs—commonly referred to as hallucinations—poses a fundamental challenge to their reliability. While uncertainty estimation is critical for mitigating such errors, LLMs are not explicitly trained to represent or express uncertainty. In this work, we investigate whether and how uncertainty is implicitly encoded within pretrained models. Through a probing-based analysis, we demonstrate that LLMs internalize multiple distinct and dataset-specific uncertainty signals, which can be extracted as linear directions in their latent space. These signals are most pronounced in intermediate layers, exhibit limited cross-task generalization, and are substantially enhanced by instruction-tuning and [IDK]-token training. Building on these findings, we propose a novel framework that leverages a unified uncertainty direction to train LLMs to classify their own correctness. Our experiments show that this approach significantly improves factual precision and reduces hallucination rates under zero-shot evaluation. Together, these results provide new insights into the internal structure of uncertainty in LLMs and introduce a practical method for aligning models toward more trustworthy behavior.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 14549

Loading