Abstract: Existing "locate-then-edit" approaches, which identify and perturb key parameters, often struggle in sequential editing scenarios, leading to overfitting, catastrophic forgetting, or model collapse. This paper introduces the Precise Neuron-Level Knowledge Editing (PNKE) framework, designed for efficient, low-interference knowledge updates via fine-grained neuron-level interventions. PNKE employs causal attribution to pinpoint background and trigger neurons tied to target knowledge, followed by an entropy-guided sparse masking mechanism to select a critical neuron subset for targeted parameter updates. Our PNKE ensures editing precision while dynamically adjusting sparsity to maintain model stability during lifelong editing. In extensive lifelong editing experiments, PNKE outperforms state-of-the-art methods, achieving an editing success rate (Rel.) of 0.936, generalization (Gen.) of 0.891, and locality (Loc.) of 0.952 on benchmarks like ZsRE and CounterFact. After 5,000 edits, PNKE sustains robust performance on tasks such as MMLU and GSM8K, underscoring its stability and practical utility for continuous knowledge integration in LLMs.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: feature attribution
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 194
Loading