Progress Measures for Grokking on Real-world Tasks

Published: 16 Jun 2024, Last Modified: 20 Jun 2024HiLD at ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: grokking, progress measures, machine learning, neural networks, generalization, overfitting
TL;DR: We find that low L2 weight norms do not explain grokking on real-world data, and introduce three conceptually motivated progress measures for the same.
Abstract: Grokking, a phenomenon where machine learning models generalize long after overfitting, has been primarily observed and studied in algorithmic tasks. This paper explores grokking in real-world datasets using deep neural networks for classification under the cross-entropy loss. We challenge the prevalent hypothesis that the $L_2$ norm of weights is the primary cause of grokking by demonstrating that grokking can occur outside the expected range of weight norms. To better understand grokking, we introduce three new progress measures: activation sparsity, absolute weight entropy, and approximate local circuit complexity. These measures are conceptually related to generalization and demonstrate a stronger correlation with grokking in real-world datasets compared to weight norms. Our findings suggest that while weight norms might usually correlate with grokking and our progress measures, they are not causative, and our proposed measures provide a better understanding of the dynamics of grokking.
Student Paper: No
Submission Number: 5
Loading