Grokked Models are Better Unlearners

19 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: machine unlearning, grokking, Deep Learning, Generalization
Abstract: \emph{Grokking}—delayed generalization that emerges well after a model has fit the training data—has been linked to robustness and representation quality. We ask whether this training regime also helps with \emph{machine unlearning}, i.e., removing the influence of specified data without full retraining. We compare applying standard unlearning methods \textit{before} versus \textit{after} the grokking transition across vision (CNNs/ResNets on CIFAR, SVHN and ImageNet) and language (a transformer on a TOFU‑style setup). Starting from grokked checkpoints consistently yields (i) more \textbf{efficient forgetting} (fewer updates to reach a target forget level), (ii) \textbf{less collateral damage} (smaller drops on retained and test performance), and (iii) \textbf{more stable updates} across seeds, relative to early‑stopped counterparts under identical unlearning algorithms. Analyses of features and curvature further suggest that post‑grokking models learn \emph{more modular representations} with reduced gradient alignment between forget and retain subsets, which facilitates selective forgetting. Our results highlight \textbf{when} a model is trained (pre‑ vs. post‑grokking) as an orthogonal lever to \textbf{how} unlearning is performed, providing a practical recipe to improve existing unlearning methods without altering their algorithms.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 18652
Loading