Abstract: As large language models (LLMs) often generate plausible but incorrect content, error detection has become increasingly critical to ensure truthfulness.
However, existing detection methods often overlook a critical problem we term as **self-consistent error**, where LLMs repeatly generate the *same* incorrect response across multiple stochastic samples.
This work formally defines self-consistent errors and evaluates mainstream detection methods on them.
Our investigation reveals two key findings:
(1) Unlike inconsistent errors, whose frequency diminishes significantly as LLM scale increases, the frequency of self-consistent errors remains stable or even increases.
(2) All four types of detection methods significantly struggle to detect self-consistent errors.
These findings reveal critical limitations in current detection methods and underscore the need for improved methods.
Motivated by the observation that self-consistent errors often differ across LLMs, we propose a simple but effective *cross‑model probe* method that fuses hidden state evidence from an external verifier LLM.
Our method significantly enhances performance on self-consistent errors across three LLM families.
Paper Type: Short
Research Area: NLP Applications
Research Area Keywords: fact checking, rumor/misinformation detection
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Keywords: hallucination, hallucination detection, uncertainty estimation, overconfidence
Submission Number: 435
Loading