Feature Learning Dynamics under Grokking in a Sparse Parity Task

Javier Sanguino Bautiste; Gregor Bachmann; Bobby He; Lorenzo Noci; Thomas Hofmann

Feature Learning Dynamics under Grokking in a Sparse Parity Task

Javier Sanguino Bautiste, Gregor Bachmann, Bobby He, Lorenzo Noci, Thomas Hofmann

Published: 16 Jun 2024, Last Modified: 29 Jan 2025HiLD at ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Grokking, feature learning, training dynamics, Neural Tangent Kernel

Abstract: In this paper, we analyze the phenomenon of Grokking in a sparse parity task trained with Deep Neural Networks through the lens of feature learning. In particular, we analyze the evolution of the Neural Tangent Kernel (NTK) matrix. We show that during the initial overfitting phase, the NTK’s eigenfunctions are not aligned with the predictive input features. On the other hand, at a later stage the NTK’s top eigenfunctions evolve to focus on the features of interest, which corresponds to the onset of the delayed generalization typically observed in Grokking. Our experiments can be viewed as a mechanistic interpretation of feature learning during training through the NTK eigenfunctions’ evolution.

Student Paper: Yes

Submission Number: 72

Loading