Asymptotic Dynamics for Delayed Feature Learning in a Toy Model

Published: 16 Jun 2024, Last Modified: 16 Jun 2024HiLD at ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: grokking; dynamics; DMFT; feature learning
TL;DR: We derive in closed form the asymptotic dynamics underlying grokking in a simple polynomial regression setting.
Abstract: We consider a toy model that exhibits grokking, recently advanced by [Kumar et al, 2023], and take advantage of the simple setting to derive the dynamics of the train and test loss using Dynamical Mean Field Theory (DMFT). This gives a closed-form expression for the gap between train and test loss that characterizes grokking in this toy model, illustrating how two parameters of interest -- NTK alignment and network laziness -- control the size of this gap and how grokking emerges as a uniquely offline property during repeated training over the same dataset. This is the first quantitative characterization of grokking dynamics in a general setting that makes no assumptions about weight decay, weight norm, etc.
Student Paper: Yes
Submission Number: 3
Loading