Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

ACL ARR 2026 January Submission2283 Authors

02 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: knowledge belief, knowledge learning
Abstract: As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose \textbf{Neighbor-Consistency Belief (NCB)}, a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new \textbf{cognitive stress-testing protocol} that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present \textbf{Structure-Aware Training (SAT)}, which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately \textbf{30\%}.
Paper Type: Long
Research Area: Language Models
Research Area Keywords: safety and alignment, robustness, continual learning
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 2283
Loading