Track: long paper (up to 4 pages)
Keywords: Grokking, Delayed Generalization, Regularization, Sparsity, Low-Rank, Overparameterization, Gradient Descent, Implicit Regularization
TL;DR: Grokking, sudden generalization following prolonged overfitting, can be triggered using alternative regularizations like $\ell_1$ and nuclear norms or leveraging depth-induced implicit biases without relying solely on norms
Abstract: Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods.
In this work, we demonstrate that grokking can be induced by regularization, either explicit or implicit. More precisely, we show that when there exists a model with a property $P$ (e.g., sparse or low-rank weights) that generalizes on the problem of interest, gradient descent with a small but non-zero regularization of $P$ (e.g., $\ell_1$ or nuclear norm regularization) result in grokking. This extends previous work showing that small non-zero weight decay induces grokking. Moreover, our analysis shows that over-parameterization by adding depth makes it possible to grok or ungrok without explicitly using regularization, which is impossible in shallow cases. We further show that the $\ell_2$ norm of the model parameters cannot be used as an indicator of grokking in a general setting in place of the regularized property $P$: the $\ell_2$ norm grows in many cases where no weight decay is used, but the model generalizes anyway. We also show that grokking can be amplified through only data selection (with any other hyperparameter fixed).
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Tikeng_Notsawo_Pascal_Junior1
Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 57
Loading