The Latent Cause Blind Spot: an Empirical Study of Update Types and Their Collateral Effects on LLMs

ICLR 2026 Conference Submission17460 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Continual learning, catastrophic forgetting, Large language models, episodic memory
TL;DR: In a continual-learning setup, we test 14 LLM update types showing that effects on prior knowledge are update-type dependent: some behavior-imprinting updates can trigger catastrophic forgetting, whereas temporal contextualization preserves knowledge
Abstract: The ability to create new memories while preserving existing ones is fundamental to intelligent learning systems. Biological learners use prediction error to decide between modifying existing memories and creating new ones, assigning surprising evidence to new \textit{latent causes}. Large language models lack this selectivity: gradient updates treat confirmations and contradictions alike, with potential catastrophic consequences. We introduce a comprehensive framework for evaluating knowledge-update effects across domains and contexts, contributing 14 distinct update datasets (230k samples, 11 newly created) that systematically vary surprise and contextual framing across factual, ethical, and code examples. After fine-tuning on Llama, Mistral, and GPT variants, we measure collateral effects on an unrelated cross-domain set. Results show that (1) learning raw contradictions causes severe degradation, driving factual accuracy on unrelated probes to below 5% in some settings. (2) Explicit temporal contextualization that mimics human-like new memory creation largely preserves unrelated knowledge, making contradictory updates behave like non-conflicting ones. (3) Some finetunes create transferable ``habits'' that generalize across domains (e.g., fine-tuning on code making models answer questions in pseudo-code), though style-only changes (e.g., longer sentences) preserve underlying knowledge. Overall, these results identify contextualization and update-induced habits as primary determinants of update safety, pointing to practical directions for continual learning.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 17460
Loading