Curvature Meets Bispectrum: A Correspondence Theory for Transformer Gauge Invariants

ICLR 2026 Conference Submission19082 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Transformer, Diffusion, Gauge Symmetry, Fisher-Rao curvature, Bispectral invariants, Multi-head attention
TL;DR: Curvature and bispectrum correspond quantitatively: Fisher-Rao curvature lower-bounds bispectral energy in multi-head attention with 98.9% validity.
Abstract: Transformers contain substantial parameter redundancies: many weight settings compute the same function. Characterizing these equivalences is key for model comparison and optimization. We prove a quantitative correspondence linking differential-geometric and harmonic-analytic invariants for neural network symmetries. We prove that Fisher-Rao curvature on the parameter-to-function quotient for multi-head attention provides a lower bound for permutation-bispectral energy in a linearized regime, revealing these two invariants as complementary aspects of the same underlying structure. Empirical validation across model scales from 4 to 24 heads demonstrates 98.9% validity of the theoretical bound, with the correspondence persisting through 10,000 training steps. By bridging differential geometry and harmonic analysis, we provide both theoretical insight into Transformer symmetries and a practical geometric framework for identifying functionally equivalent models. We report correspondence in native units, with curvature as a squared Frobenius norm throughout.
Primary Area: learning theory
Submission Number: 19082
Loading