Memorize to Forget: Machine Unlearning without Gradient Ascent via Model Extrapolation

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Unlearning, Gradient Ascent, Memorization
TL;DR: We propose a novel unlearing framework which enhances memorization to achieve forgetting, avoiding the collapse caused by gradient ascent.
Abstract: For ethical and safe AI, machine unlearning rises as a critical topic aiming to protect sensitive, private, and copyrighted knowledge from misuse. To achieve this goal, it is common to conduct gradient ascent (GA) to reverse the training on undesired data. However, such a reversal is prone to catastrophic collapse, which leads to serious performance degradation in general tasks. As a solution, we propose model extrapolation as an alternative to GA, which reaches the counterpart direction in the hypothesis space from one model given another reference model. Therefore, we leverage the original model as the reference, further train it to memorize undesired data while keeping prediction consistency on the rest of the retained data, to obtain a memorization model. Counterfactual as it might sound, a \textit{forget model} can be obtained via extrapolation from the memorization model to the reference model. Hence, we avoid directly acquiring the forget model using GA, but proceed with gradient descent for the memorization model, which successfully stabilizes the machine unlearning process. Our model extrapolation is simple and efficient to implement, and it can also effectively converge throughout training to achieve improved unlearning performance.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 5812
Loading