Is Delayed Robustness Really Grokking?

Stijn van den Beemt; Twan van Laarhoven; Marco Loog

Is Delayed Robustness Really Grokking?

Stijn van den Beemt, Twan van Laarhoven, Marco Loog

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: neural networks, delayed robustness, grokking, adversarial attacks, deep learning

Abstract: We analyze the phenomenon of delayed robustness, where a neural network trained beyond overfitting becomes robust to adversarial attacks. This phenomenon was first observed by Humayun et al. (2024), and characterized as grokking behavior. We reproduce delayed robustness to PGD attacks in multiple set-ups and, using stronger attacks, show that this robustness is actually overestimated. We then demonstrate that delayed robustness is not grokking, but instead the result of two unintended side effects during overtraining: softmax collapse in the cross-entropy loss function and a too large effective learning rate caused by gradient scaling in the Adam optimizer. We provide experimental evidence that these issues indeed create networks that resist PGD attacks without actually becoming as robust to the stronger attacks. We also point out a relation with dying neurons and the slingshot effect. Using simple interventions to solve these issues, we show that no delayed robustness appears.

Supplementary Material: zip

Primary Area: learning theory

Submission Number: 7468

Loading