Reconstruct the Understanding of Grokking through Dynamical Systems

25 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: interpretability, grokking, dynamical systems, progress measures
TL;DR: We proposed a framework for understanding classification tasks in dynamical systems and used it to analyze grokking.
Abstract: \textbf{Grokking}, or the \textbf{delayed generalization phenomenon}, describes the abrupt and rapid improvement in test accuracy that occurs after a model has been overfitted for a prolonged period. This phenomenon was first identified by Power in the context of operations on a prime number field. Over the past two years, a range of mathematical analyses has been conducted to investigate grokking, typically involving the use of the hidden progress measure which mean a function that can anticipate the occurrence of grokking. We believe that a comprehensive and rigorous mathematical modeling approach can invigorate the research on this task and provide a unified perspective for understanding previous research. This paper introduces a novel approach by modeling the task as a unique dynamical system. Using mathematical derivation within this framework, we propose a robust hidden progress measure that effectively captures the grokking phenomenon across all operations on prime number fields. This approach not only provides a more complete understanding but also offers deeper insights into the underlying architecture of the model. Based on this understanding, we also proposed a method to accelerate grokking without involving regularization or altering the model architecture.
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4284
Loading