Grokfast: Gradient filters for faster grokking

Jaerin Lee; Bong Gyun Kang; Kihoon Kim; Kyoung Mu Lee

Grokfast: Gradient filters for faster grokking

Jaerin Lee, Bong Gyun Kang, Kihoon Kim, Kyoung Mu Lee

21 Sept 2024 (modified: 22 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: grokking, generalization, acceleration, gradient filter, optimization, low-pass filter

TL;DR: We accelerate the grokking phenomenon by amplifying low-frequencies of the parameter gradients with an augmented optimizer.

Abstract: One puzzling artifact in machine learning, dubbed grokking, refers to the case where a model exhibits delayed generalization after numerous training iterations after nearly perfect overfitting. Focusing on the long delay itself on behalf of machine learning practitioners, our primary goal is to accelerate the generalization of a model under the grokking phenomenon. By regarding a series of gradients of a parameter over training iterations as a random signal over time, we can spectrally decompose the parameter trajectories under gradient descent into two components: the fast-varying, overfitting-yielding component, and the slow-varying, generalization-inducing component. This analysis allows us to accelerate the grokking phenomenon more than $\times 50$ with only a few lines of code that amplifies the slow-varying components of the gradients. The experiments show that our algorithm applies to diverse tasks involving images, languages, and graphs, enabling the practical availability of this peculiar artifact of sudden generalization. Moreover, we reinterpret momentum hyperparameters in gradient-based optimizers as low-pass filters with size-1 windows. This bridges between optimization and classical signal processing literature, suggesting a new type of optimzers augmented with frequecy-domain filters.

Supplementary Material: zip

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2286

Loading