Why Some Models Resist Unlearning: A Linear Stability Perspective

ICLR 2026 Conference Submission21570 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Linear stability, Optimization theory, Optimization, Unlearning, data geometry
Abstract: Machine unlearning—the ability to erase the effect of specific training samples without retraining from scratch—is critical for privacy, regulation, and efficiency. However, most progress in unlearning has been empirical, with little theoretical understanding of when and why unlearning works. We tackle this gap by framing unlearning through the lens of asymptotic linear stability to capture the interaction between optimization dynamics and data geometry. The key quantity in our analysis is data coherence - the cross‑sample alignment of loss‑surface directions near the optimum. We decompose coherence along three axes: within the retain set, within the forget set, and between them, and prove tight stability thresholds that separate convergence from divergence. To further link data properties to forgettability, we study a two‑layer ReLU CNN under a signal‑plus‑noise model and show that stronger memorization makes forgetting easier: when the signal‑to‑noise ratio (SNR) is lower, cross‑sample alignment is weaker, reducing coherence and making unlearning easier; conversely, high‑SNR, highly aligned models resist unlearning. For empirical verification, we show that Hessian tests and CNN heatmaps align closely with the predicted boundary, mapping the stability frontier of gradient‑based unlearning as a function of batching, mixing, and data/model alignment. Our analysis is grounded in random matrix theory tools and provides the first principled account of the trade-offs between memorization, coherence, and unlearning.
Supplementary Material: pdf
Primary Area: optimization
Submission Number: 21570
Loading