The Digital Dunning-Kruger Effect: Decoupling Hallucinations via Geometric Hidden-state Observation for Semantic Truthfulness

The Digital Dunning-Kruger Effect: Decoupling Hallucinations via Geometric Hidden-state Observation for Semantic Truthfulness

ACL ARR 2026 January Submission2505 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Hallucination Detection, Large Language Models, Geometric Hidden-state Analysis, Digital Dunning-Kruger Effect, White-box Interpretability, Natural Language Processing

Abstract: Large Language Models (LLMs) often generate overconfident yet factually incorrect hallucinations. Current detection paradigms suffer from a trade-off between the high accuracy of computationally expensive black-box methods and the inability of white-box methods to detect stubborn hallucinations. To bridge this gap, we propose GHOST(Geometric Hidden-state Observation for Semantic Truthfulness), an efficient white-box framework for hallucination detection in LLMs. We distinguish between two hallucination mechanisms: confused hallucinations, marked by internal reasoning instability, and stubborn hallucinations, characterized by premature layer-wise convergence. By integrating internal geometric dynamics with output probability distributions, GHOST constructs a high-dimensional feature space for non-linear truthfulness classification. Extensive evaluations on FinanceBench, RAGTruth, HaluEval, and PopQA show that GHOST outperforms white-box baselines and achieves competitive black-box performance while reducing computational overhead by over 90%, offering a robust solution for real-time detection.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Interpretability and Analysis of Models for NLP, Language Modeling

Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency, Data analysis

Languages Studied: English

Submission Number: 2505

Loading