Track: Extended Abstract (non-archival, 4 pages)
Keywords: mechanistic interpretability, manifold hypothesis, geometry, topology, universality, representation learning
Abstract: The Clock and Pizza interpretations, associated with neural architectures differing
in either uniform or learnable attention, were introduced to argue that different
architectural designs can yield distinct circuits for modular addition. Applying
geometric and topological analyses to learned representations, we show that this
is not the case: Clock and Pizza circuits are topologically and geometrically
equivalent and are thus equivalent representations.
Submission Number: 39
Loading