Provable Unlearning with Gradient Ascent on Two-Layer ReLU Neural Networks

18 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Unlearning, Theory, Approximate Unlearning, Implicit Bias
TL;DR: We provide a theoretical framework for machine unlearning and provide theoretical guarantees that a simple gradient-ascent step can reliably remove the influence of specific data while preserving model performance.
Abstract: Machine Unlearning aims to remove specific data from trained models, addressing growing privacy and ethical concerns. We provide a theoretical analysis of a simple and widely used method—gradient ascent— used to reverse the influence of a specific data point without retraining from scratch. Leveraging the implicit bias of gradient descent towards solutions that satisfy the Karush-Kuhn-Tucker (KKT) conditions of a margin maximization problem, we quantify the quality of the unlearned model by evaluating how well it satisfies these conditions w.r.t. the retained data. To formalize this idea, we propose a new success criterion, termed \textbf{($\epsilon, \delta, \tau$) -successful} unlearning, and show that, for both linear models and two-layer neural networks with high dimensional data, a properly scaled gradient-ascent step satisfies this criterion and yields a model that closely approximates the retrained solution on the retained data. We also show that gradient ascent performs successful unlearning while still preserving generalization in a synthetic Gaussian-mixture setting.
Primary Area: learning theory
Submission Number: 12290
Loading