Keywords: Hallucination Detection, Distribution Modeling, Normalizing Flow
Abstract: Despite the remarkable advancements, large language models (LLMs) still frequently generate outputs that contain factually incorrect or contextually irrelevant information, commonly known as hallucinations. Detecting these hallucinations accurately and efficiently remains an open challenge, especially without relying on labeled datasets. Current methods primarily depend on internal activation or consistency of multiple responses for one prompt, limiting their effectiveness in capturing global semantic and distributional structures of truthful outputs. Besides, methods that estimate latent subspaces directly from mixed-quality data, suffer from noise contamination and imprecise geometric representations. To address these limitations, we propose a novel Distance-Aware Distribution Modeling (DADM) framework that operates in two stages: first, we apply an iterative distance-based process to select consistently truthful samples; second, we model the global distribution using normalizing flows, enabling accurate likelihood estimation by maximizing the likelihood of truthful samples and minimizing the likelihood of hallucinated samples. This two-stage design ensures both robust sample purification and expressive modeling of truthful generations, leading to interpretable confidence scores and more reliable hallucination detection. Extensive experiments on benchmark datasets demonstrate that our method consistently outperforms prior unsupervised approaches across multiple LLM settings.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 18538
Loading