From Out-of-Distribution Detection to Hallucination Detection: A Geometric View

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Detecting hallucinations in large language models is a critical open problem with significant implications for safety and reliability. While existing hallucination detection methods achieve strong performance in question‑answering tasks, they remain less effective on tasks requiring reasoning. In this work, we revisit hallucination detection through the lens of out‑of‑distribution (OOD) detection, a well‑studied problem in areas like computer vision. Treating next‑token prediction in language models as a classification task allows us to apply OOD techniques, if we bring to bear appropriate modifications to account for the structural differences in large language models. We show that approaches based on OOD detection yield training-free, single-sample based detectors, achieving strong accuracy in hallucination detection in reasoning tasks. Overall, our work suggests that reframing hallucination detection as OOD detection provides a promising and scalable pathway toward language model safety.
Lay Summary: Large language models can sometimes produce incorrect or unsupported answers, known as hallucinations. We develop a lightweight method to detect hallucinations by tracking uncertainty in the model’s next-word predictions. By identifying potentially unreliable model outputs, our method supports safer and more reliable AI deployment.
Primary Area: Deep Learning->Large Language Models
Keywords: Uncertainty Quantification, Out-of-Distribution Detection, Hallucination Detection
Originally Submitted PDF: pdf
Submission Number: 1485
Loading