On the Geometry and Topology of Neural Circuits for Modular Addition

Published: 30 Sept 2025, Last Modified: 30 Sept 2025Mech Interp Workshop (NeurIPS 2025) PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Circuit analysis, Understanding high-level properties of models, Applications of interpretability
Other Keywords: modular addition, geometry, topology, representation learning, manifold hypothesis, universality
TL;DR: We find that networks (MLPs, transformers) with learnable embeddings are approximating a torus-to-circle map, differing by how they factor it. We find clock and pizza are the same.
Abstract: Using tools from geometry and topology, we reveal that the circuits learned by neural networks trained on modular addition are simply different implementations of one global algorithmic strategy. We show that all architectures previously studied on this problem learn topologically equivalent algorithms. Notably, this finding concretely reveals that what appeared to be disparate circuits emerging for modular addition in the literature are actually equivalent from a topological lens. Furthermore, we introduce a new neural architecture that truly does learn a topologically distinct algorithm. We then resolve this under the lens of geometry however, and recover universality by showing that all networks studied learn modular addition via approximating a torus-to-circle map. They differ in how they factor this map, either via 2D toroidal intermediate representations, or via combinations of certain projections of this 2D torus. Resultantly, we argue that our geometric and topological perspective on neural circuits restores the universality hypothesis.
Submission Number: 279
Loading