On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking

On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking

ICLR 2026 Conference Submission20794 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: mechanistic interpretation, training dynamics, modular addition, feature learning

TL;DR: We demystify the feature learning and training dynamics of the gradient-based training on modular addition task.

Abstract: We present a comprehensive analysis of how two-layer neural networks learn features to solve the modular addition task. Our work provides a full mechanistic interpretation of the learned model and a theoretical explanation of its training dynamics. First, we empirically show that trained networks learn a sparse Fourier representation; each neuron's parameters form a trigonometric pattern corresponding to a single frequency. We identify two key structural properties: phase alignment, where a neuron's output phase is twice its input phase, and model symmetry, where phases are uniformly distributed among neurons sharing the same frequency, particularly when overparametrized. We prove that these properties allow the network to collectively approximate an indicator function on the correct logic for the modular addition task. While individual neurons produce noisy signals, the phase symmetry enables a majority-voting scheme that cancels out noise, allowing the network to robustly identify the correct sum. We then explain how these features are learned through a "lottery ticket mechanism". An analysis of the gradient flow reveals that frequencies compete within each neuron during training. The winning frequency that ultimately dominates is predictably determined by its initial magnitude and phase misalignment. Finally, we use these insights to demystify grokking, characterizing it as a three-stage process involving memorization followed by two generalization phases driven by feature sparsification.

Primary Area: interpretability and explainable AI

Submission Number: 20794

Loading