Interpreting neural networks depends on the level of abstraction: Revisiting modular addition

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: mechanistic interpretability, group theory, universality
Abstract: Prior work in mechanistic interpretability has analyzed how neural networks solve modular arithmetic tasks, but conflicting interpretations have emerged, questioning the universality hypothesis—that similar tasks lead to similar learned circuits. Revisiting modular addition, we identify that these discrepancies stem from overly granular analyses, which obscure the higher-level patterns that unify seemingly disparate solutions. Using a multi-scale approach—microscopic (neurons), mesoscopic (clusters of neurons), and macroscopic (entire network)—we show that all scales align on (approximate) cosets and implement an abstract algorithm resembling the approximate Chinese Remainder Theorem. Additionally, we propose a model where networks aims for a constant logit margin, predicting $\mathcal{O}(\log(n))$ frequencies—more consistent with empirical results in networks with biases, which are more expressive and commonly used in practice, than the $\frac{n-1}{2}$ frequencies derived from bias-free networks. By uncovering shared structures across setups, our work provides a unified framework for understanding modular arithmetic in neural networks and generalizes existing insights to broader, more realistic scenarios.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13229
Loading