Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

ACL ARR 2026 January Submission2283 Authors

02 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: knowledge belief, knowledge learning

Abstract: As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose \textbf{Neighbor-Consistency Belief (NCB)}, a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new \textbf{cognitive stress-testing protocol} that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present \textbf{Structure-Aware Training (SAT)}, which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately \textbf{30\%}.

Paper Type: Long

Research Area: Language Models

Research Area Keywords: safety and alignment, robustness, continual learning

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis

Languages Studied: English

Submission Number: 2283

Loading