Bold Claims or Self-Doubt?  Factuality Hallucination Type Detection via Belief State

Bold Claims or Self-Doubt? Factuality Hallucination Type Detection via Belief State

ACL ARR 2025 February Submission6992 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models are prone to generating hallucination that deviates from factual information. Existing studies mainly focus on detecting the presence of hallucinations but lack a systematic classification approach, which hinders deeper exploration of their characteristics. To address this, we introduce the concept of belief state, which quantifies the model's confidence in its own responses. We define the belief state of the model based on self-consistency, leveraging answer repetition rates to label confident and uncertain states. Based on this, we categorize factuality hallucination into two types: Overconfident Hallucination and Unaware Hallucination. Furthermore, we propose $\textbf{BAFH}$, a factuality hallucination type detection method. By training a classifier on model's hidden states, we establish a link between hidden states and belief states, enabling efficient and automatic hallucination type detection. Experimental results demonstrate the effectiveness of BAFH and the differences between hallucination types.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: knowledge tracing/discovering/inducing, probing, robustness,calibration/uncertainty

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: english

Submission Number: 6992

Loading