Edit-then-Consolidate for Reliable Knowledge Editing

ICLR 2026 Conference Submission15570 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Knowledge Editing
Abstract: Knowledge editing aims to update specific facts in large language models (LLMs) without full retraining. Prior efforts sought to tune the knowledge layers of LLMs, proving effective for making selective edits. However, a significant gap emerges between their effectiveness in controlled teacher-forcing evaluations and their performance in real-world evaluations under Lifelong editing, which severely limits their practical applicability. In this work, we reveal that this gap arises from two key issues: (1) Existing methods lead the edited model to overfit to new facts, thereby degrading pre-trained capabilities. (2) There is a critical absence of a knowledge consolidation stage, which prevents new facts from integrating into LLMs' reasoning policy and thus leads to a mismatch between parametric knowledge and reasoning policy. To this end, we propose the Edit-then-Consolidate, a novel knowledge editing paradigm that bridges the crucial gap between theoretical knowledge editing methods and their real-world applicability. Specifically, (1) our framework addresses overfitting via Targeted Proximal Supervised Fine-Tuning (TPSFT) that localizes the edit via a trust-region objective to limit policy drift. (2) Then a consolidation stage using Group Relative Policy Optimization (GRPO) aligns the edited knowledge with multi-step reasoning by optimizing trajectory-level behavior under comprehensive reward signals. Extensive experiments demonstrate our framework consistently improves editing reliability and generalization under real-world evaluations, while better preserving locality and pre-trained capabilities.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 15570
Loading