Keywords: machine unlearning, grokking, Deep Learning, Generalization
Abstract: The phenomenon of $\textbf{grokking}$, where deep neural networks achieve delayed but strong generalization long after fitting training data, challenges traditional views of model generalization. While previous work has shown that grokked models exhibit enhanced robustness, we establish a novel connection: grokked models are fundamentally better at machine unlearning—the process of removing specific data influences without full retraining. We provide comprehensive empirical evidence across CNNs and ResNets on CIFAR datasets, and transformers on text datasets. State-of-the-art unlearning algorithms (gradient ascent, SCRUB, Fisher forgetting, and fine-tuning) achieve significantly more efficient data removal when applied to grokked models. Critically, unlearned grokked models retain higher performance on remaining data and exhibit enhanced robustness compared to non-grokked counterparts. Our analysis reveals that grokking restructures internal representations, creating more disentangled knowledge that facilitates selective forgetting with minimal collateral damage. These findings establish the first systematic connection between grokking and machine unlearning, suggesting that grokking-induced training dynamics can be leveraged for more practical and robust privacy-preserving unlearning methods.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 18652
Loading