LLMInertia: Investigating and Mitigating Large Language Models' Unfaithfulness to Input Evidence from a Cognitive Inertia Perspective
Keywords: Large Language Models, Faithfulness Hallucination, Input Evidence, Co-occurrence Bias
Abstract: Large Language Models (LLMs) frequently generate output that contradicts or disregards explicit input evidence, limiting their reliability across diverse applications. We identify cognitive inertia in LLMs—the tendency to overly rely on co-occurrence associations even when confronted with new or contradictory input evidence—as an important contributing factor to such hallucinations. Through targeted experiments, we show that LLM adherence to explicit input evidence decreases as the strength of co-occurrence associations in pretraining data increases. Inspired by human counter-inertial reasoning, we propose an adaptive counter-inertial reasoning framework that probes cognitive inertia in LLMs related to the input and generates adaptive counter-inertial reminders, which are then injected into the prompt to promote more faithful and evidence-based reasoning. Experimental results in co-occurrence-induction data sets show that LLMInertia significantly reduces hallucination induction rates by 14.16\% and improves accuracy by 12.72\% on average. Comprehensive evaluations on four summarization and question-answer datasets, using three different LLM backbones, further demonstrate the effectiveness and robustness of our approach, highlighting a promising direction for developing more reliable LLM applications.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 8867
Loading