Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

ICLR 2024 Workshop ME-FoMo Submission12 Authors

Published: 04 Mar 2024, Last Modified: 05 May 2024ME-FoMo 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: grokking, generalization, loss landscape, spectral energy

TL;DR: We propose a cost-effective method to predict grokking in neural networks by analyzing early learning curves and detecting specific oscillations using the Fourier transform. Additional experiments explore origins and characterize the loss landscape

Abstract: This paper presents a cost-effective method for predicting grokking in neural networks—delayed perfect generalization following overfitting or memorization. By analyzing the learning curve of the first few epochs, we show that certain oscillations forecast grokking in extended training. Our approach, using the Fourier transform's \emph{spectral signature}, efficiently detects these oscillations. Additional experiments explore their origins and characterize the loss landscape.

Submission Number: 12

Loading