The Digital Dunning-Kruger Effect: Decoupling Hallucinations via Geometric Hidden-state Observation for Semantic Truthfulness

ACL ARR 2026 January Submission2505 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hallucination Detection, Large Language Models, Geometric Hidden-state Analysis, Digital Dunning-Kruger Effect, White-box Interpretability, Natural Language Processing
Abstract: Large Language Models (LLMs) often generate overconfident yet factually incorrect hallucinations. Current detection paradigms suffer from a trade-off between the high accuracy of computationally expensive black-box methods and the inability of white-box methods to detect stubborn hallucinations. To bridge this gap, we propose GHOST(Geometric Hidden-state Observation for Semantic Truthfulness), an efficient white-box framework for hallucination detection in LLMs. We distinguish between two hallucination mechanisms: confused hallucinations, marked by internal reasoning instability, and stubborn hallucinations, characterized by premature layer-wise convergence. By integrating internal geometric dynamics with output probability distributions, GHOST constructs a high-dimensional feature space for non-linear truthfulness classification. Extensive evaluations on FinanceBench, RAGTruth, HaluEval, and PopQA show that GHOST outperforms white-box baselines and achieves competitive black-box performance while reducing computational overhead by over 90%, offering a robust solution for real-time detection.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Interpretability and Analysis of Models for NLP, Language Modeling
Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency, Data analysis
Languages Studied: English
Submission Number: 2505
Loading