Keywords: bispectral invariance, quotient manifolds, harmonic analysis, model equivalence, gauge symmetry
TL;DR: We show Fisher-Rao curvature lower-bounds bispectral energy in Transformer attention, unifying geometric and algebraic invariants. A hybrid pipeline speeds equivalence testing; validated across scales with 98.9% bound validity.
Abstract: Understanding which parameter changes leave a Transformer's function unchanged is essential for model comparison, optimization, and interpretability. This paper establishes a quantitative correspondence between geometric and algebraic approaches to neural network invariance, unifying two previously disconnected mathematical frameworks. We prove that Fisher-Rao curvature on the parameter-to-function quotient for multi-head attention provides a lower bound for bispectral energy in a linearized regime, revealing these two invariants as complementary aspects of the same underlying structure. Our theoretical framework yields practical benefits: a hybrid computational pipeline that substantially reduces runtime relative to pure algebraic methods while maintaining high discrimination accuracy for equivalence testing. Empirical validation across model scales from 4 to 24 heads demonstrates 98.9% validity of the theoretical bound, with the correspondence persisting through 10,000 training steps. By bridging differential geometry and harmonic analysis, we provide both theoretical insight into Transformer symmetries and efficient algorithms for identifying functionally equivalent models in practice.
Submission Number: 21
Loading