Grokked Models are Better Unlearners

Grokked Models are Better Unlearners

ICLR 2026 Conference Submission18652 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: machine unlearning, grokking, Deep Learning, Generalization

Abstract: The phenomenon of $\textbf{grokking}$, where deep neural networks achieve delayed but strong generalization long after fitting training data, challenges traditional views of model generalization. While previous work has shown that grokked models exhibit enhanced robustness, we establish a novel connection: grokked models are fundamentally better at machine unlearning—the process of removing specific data influences without full retraining. We provide comprehensive empirical evidence across CNNs and ResNets on CIFAR datasets, and transformers on text datasets. State-of-the-art unlearning algorithms (gradient ascent, SCRUB, Fisher forgetting, and fine-tuning) achieve significantly more efficient data removal when applied to grokked models. Critically, unlearned grokked models retain higher performance on remaining data and exhibit enhanced robustness compared to non-grokked counterparts. Our analysis reveals that grokking restructures internal representations, creating more disentangled knowledge that facilitates selective forgetting with minimal collateral damage. These findings establish the first systematic connection between grokking and machine unlearning, suggesting that grokking-induced training dynamics can be leveraged for more practical and robust privacy-preserving unlearning methods.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 18652

Loading