A simple and interpretable model of grokking modular arithmetic tasks

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: visualization or interpretation of learned representations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: grokking, mechanistic interpretability, emergent capabilities, emergence, physics of AI, phase transition, circuits, pattern formation, solvable model, superposition
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: I describe a simple neural network that (i) generalizes on arithmetic tasks, and (ii) weights and learnt representations are fully interpretable (that is, known analytically)
Abstract: We present a simple neural network that can generalize on various modular arithmetic tasks such as modular addition or multiplication, and exhibits a sudden jump in generalization known as \emph{grokking}. Concretely, we present (i) fully-connected two-layer networks that exhibit grokking on various modular arithmetic tasks under vanilla gradient descent with the MSE loss function in the absence of any regularization; (ii) evidence that grokking modular arithmetic corresponds to learning specific representations whose structure is determined by the task; (iii) \emph{analytic} expressions for the weights -- and thus for the embedding -- that solve a large class of modular arithmetic tasks; and (iv) evidence that these representations are also found by gradient descent as well as AdamW, establishing complete ("mechanistic") interpretability of the representations learnt by the network.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8211
Loading