Detecting LLM Hallucinations via Nonlinear Manifold Separation

Zhihao Li; Xue Jiang; Jingyi Chen; Xuelin Zhang; Jian Li; Hong Chen; Feng Zheng; Bo Han

Detecting LLM Hallucinations via Nonlinear Manifold Separation

Zhihao Li, Xue Jiang, Jingyi Chen, Xuelin Zhang, Jian Li, Hong Chen, Feng Zheng, Bo Han

18 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Hallucination Detection; Manifold

Abstract: Large language models (LLMs) exhibit remarkable capabilities across various tasks but are prone to generating hallucinations, raising significant concerns about their reliability. Existing approaches for detecting hallucinations in unlabeled, real-world data often utilize information from the latent feature space. However, these studies have not thoroughly analyzed the sample distributions within this space and typically rely on linear separation methods. To better characterize these distributions, we introduce Hallucination Attention Regions (HARs) and True Attention Regions (TARs) to model the latent-space representations of hallucinated and truthful samples, respectively. Our empirical analysis reveals that HARs and TARs are nonlinearly separable. Based on this finding, we hypothesize that these high-dimensional distributions can be embedded into a low-dimensional manifold. We thus propose the HDME framework for automatically detecting hallucinations in unlabeled data. This framework comprises three steps: (1) projecting high-dimensional samples onto a low-dimensional manifold, (2) clustering the embedded data to generate pseudo-labels, and (3) training a hallucination detector with these pseudo-labels. Extensive experiments demonstrate that our method achieves superior performance in hallucination detection across diverse datasets.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 10424

Loading