NegMerge: Sign-Consensual Weight Merging for Machine Unlearning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: NegMerge is a novel machine unlearning method that computes task vectors from multiple models, combining only elements with consistent signs to effectively induce forgetting while preserving the model's retained knowledge.
Abstract: Machine unlearning aims to selectively remove specific knowledge from a trained model. Existing approaches, such as Task Arithmetic, fine-tune the model on the forget set to create a task vector (i.e., a direction in weight space) for subtraction from the original model's weight. However, their effectiveness is highly sensitive to hyperparameter selection, requiring extensive validation to identify the optimal vector from many fine-tuned candidates. In this paper, we propose a novel method that utilizes all fine-tuned models trained with varying hyperparameters instead of a single selection. Specifically, we aggregate the computed task vectors by retaining only the elements with consistent shared signs. The merged task vector is then negated to induce unlearning on the original model. Evaluations on zero-shot and standard image recognition tasks across twelve datasets and four backbone architectures show that our approach outperforms state-of-the-art methods while requiring similar or fewer computational resources. Code is available at https://github.com/naver-ai/negmerge.
Lay Summary: AI models learn from large amounts of data, but when someone asks for their data to be removed, it’s difficult to make the model forget just that part. Existing solutions, known as machine unlearning, are often sensitive to training settings and can hurt the model’s overall performance. To address this, we introduce a new method called NegMerge. Instead of picking just one model from many training runs and hoping it works, NegMerge uses all of them. It identifies what the models consistently agree should be forgotten and merges those parts to remove unwanted information, without affecting the rest. NegMerge helps AI to forget specific data while preserving the rest of its knowledge intact, without wasting time on choosing a single best model. This makes unlearning not only more effective but also faster and more reliable.
Link To Code: https://github.com/naver-ai/negmerge
Primary Area: General Machine Learning->Transfer, Multitask and Meta-learning
Keywords: Machine Unlearning, Image Classification, Model Merging
Submission Number: 9234
Loading