Is Delayed Robustness Really Grokking?

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: neural networks, delayed robustness, grokking, adversarial attacks, deep learning
Abstract: We analyze the phenomenon of delayed robustness, where a neural network trained beyond overfitting becomes robust to adversarial attacks. This phenomenon was first observed by Humayun et al. (2024), and characterized as grokking behavior. We reproduce delayed robustness to PGD attacks in multiple set-ups and, using stronger attacks, show that this robustness is actually overestimated. We then demonstrate that delayed robustness is not grokking, but instead the result of two unintended side effects during overtraining: softmax collapse in the cross-entropy loss function and a too large effective learning rate caused by gradient scaling in the Adam optimizer. We provide experimental evidence that these issues indeed create networks that resist PGD attacks without actually becoming as robust to the stronger attacks. We also point out a relation with dying neurons and the slingshot effect. Using simple interventions to solve these issues, we show that no delayed robustness appears.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 7468
Loading