Keywords: model editing, LLM unlearning, task vector, machine unlearning
Abstract: To unlearn certain entities in **large language models** (LLMs), model editing is performed by subtracting an entity-specific task vector (TV)--the parameter difference between the entity-tuned model and the original model--from the full LLM. Unlike training-based methods, it avoids costly iterative training. However, as the TV can overlap with LLM parameters essential for retaining knowledge, model editing may suffer from over-forgetting. Observing that each parameter may exhibit different importance for entities to be unlearned versus retained, in this paper, we propose a parameter-wise **weighted model editing** (WME) mechanism to rescale the TV, allowing flexible adjustment of the editing magnitude. These parameter-wise weights quantify the relative importance of each parameter for forgetting versus retention, estimated via ***grad**ients* (i.e., WME-grad) or the *diagonal **Fisher** information approximation* (i.e., WME-fisher). Furthermore, we extend WME to a more general form and provide a discussion of its effectiveness. Results on unlearning benchmarks show that WME outperforms the vanilla TV baseline, and even surpasses popular training-based unlearning methods in both forgetting quality and model utility. While preserving the efficiency of model editing-based approaches, WME maintains the retentive capacity for retaining knowledge, offering a new perspective for both LLM unlearning and flexible LLM editing. Our code is available at https://anonymous.4open.science/r/WME.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 10290
Loading