Abstract: Large language models (LLMs) require continual knowledge updates to keep pace with the evolving world. While various model editing methods have been proposed, most face critical challenges in lifelong learning contexts due to two fundamental limitations: (1) Edit Overshooting - parameter updates intended for a specific fact spill over to unrelated regions, causing interference with previously retained knowledge; and (2) Knowledge Entanglement - polysemantic neurons' overlapping encoding of multiple concepts makes it difficult to isolate and edit a single fact. In this paper, we propose MicroEdit, a neuron-level editing method that performs minimal and controlled interventions within LLMs. By leveraging a sparse autoencoder (SAE), MicroEdit disentangles knowledge representations and
activates only a minimal set of necessary neurons for precise parameter updates. This targeted design enables fine-grained control over the editing scope, effectively mitigating interference and preserving unrelated knowledge. Extensive experiments show that MicroEdit outperforms prior methods and robustly handles lifelong knowledge editing across QA and Hallucination settings on LLaMA and Mistral.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: model editing
Languages Studied: English
Submission Number: 6505
Loading