Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs

Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs

ACL ARR 2025 May Submission435 Authors

12 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As large language models (LLMs) often generate plausible but incorrect content, error detection has become increasingly critical to ensure truthfulness. However, existing detection methods often overlook a critical problem we term as **self-consistent error**, where LLMs repeatly generate the *same* incorrect response across multiple stochastic samples. This work formally defines self-consistent errors and evaluates mainstream detection methods on them. Our investigation reveals two key findings: (1) Unlike inconsistent errors, whose frequency diminishes significantly as LLM scale increases, the frequency of self-consistent errors remains stable or even increases. (2) All four types of detection methods significantly struggle to detect self-consistent errors. These findings reveal critical limitations in current detection methods and underscore the need for improved methods. Motivated by the observation that self-consistent errors often differ across LLMs, we propose a simple but effective *cross‑model probe* method that fuses hidden state evidence from an external verifier LLM. Our method significantly enhances performance on self-consistent errors across three LLM families.

Paper Type: Short

Research Area: NLP Applications

Research Area Keywords: fact checking, rumor/misinformation detection

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Keywords: hallucination, hallucination detection, uncertainty estimation, overconfidence

Submission Number: 435

Loading