Keywords: grokking, generalization, loss landscape, spectral energy
TL;DR: We propose a cost-effective method to predict grokking in neural networks by analyzing early learning curves and detecting specific oscillations using the Fourier transform. Additional experiments explore origins and characterize the loss landscape
Abstract: This paper presents a cost-effective method for predicting grokking in neural networks—delayed perfect generalization following overfitting or memorization. By analyzing the learning curve of the first few epochs, we show that certain oscillations forecast grokking in extended training. Our approach, using the Fourier transform's \emph{spectral signature}, efficiently detects these oscillations. Additional experiments explore their origins and characterize the loss landscape.
Submission Number: 12
Loading