On the Retention of Edited Knowledge in Fine-Tuned Language Models

On the Retention of Edited Knowledge in Fine-Tuned Language Models

ACL ARR 2026 January Submission6369 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Explainable AI, Mechanistic Interpretability, Large Language Models

Abstract: Large language models (LLMs) store vast amounts of knowledge, which often requires updates to correct factual errors, incorporate newly acquired information, or adapt model behavior. Model editing methods have emerged as efficient solutions for such updates, offering localized and precise knowledge modification at significantly lower computational cost than continual training. In parallel, LLMs are frequently fine-tuned for a wide range of downstream tasks. However, the effect of fine-tuning on previously edited knowledge remains poorly understood. In this work, we systematically investigate how different fine-tuning objectives interact with various model editing techniques. \textbf{Our findings show that the edited knowledge is more easily forgetten during fine-tuning than intrinsic knowledge acquired through pre-training, revealing a fundamental distinction between post-hoc edits and native model knowledge.} This analysis highlights a key limitation of current editing approaches and suggests that evaluating edit robustness under downstream fine-tuning is critical for their practical deployment. We further find that knowledge retention can be significantly improved by either augmenting edit knowledge with paraphrases or by freezing layers associated with edited content in fine-tuning stage, offering insight for developing more robust editing algorithms.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: model editing, robustness

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 6369

Loading