Keywords: Knowledge Editing, Multi-hop Reasoning, Large Language Models, Chain-of-Thought, Interpretability, Hallucination
Abstract: Knowledge editing aims to efficiently update LLMs, yet generalization to multi-hop reasoning remains a bottleneck. While Chain-of-Thought (CoT) is often proposed as a solution, we reveal a critical structural asymmetry. By evaluating ROME on Gravity-QA, we find that CoT successfully bridges reasoning gaps in rigid domains (Geography, ~71% success) but fails in flexible domains (Humanities, ~46%), where it often devolves into Cognitive Collapse (hallucination). We attribute this to Structural Rigidity: functional uniqueness ensures epistemic clarity, while relational ambiguity invites fabrication. Through a pathological analysis, we conceptualize failures as a spectrum of conflict responses: models may Halt (Activation Blockage), Fight (Semantic Rejection), or Fantasize (Cognitive Collapse), warning that editing in high-entropy domains carries higher risks of silent hallucination.
Paper Type: Short
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: model editing, explanation faithfulness, robustness, multihop QA, chain-of-thought
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 8585
Loading