Curvature Meets Bispectrum: A Correspondence Theory for Transformer Gauge Invariants

Hong Wang; Kelly Wang

Curvature Meets Bispectrum: A Correspondence Theory for Transformer Gauge Invariants

Hong Wang, Kelly Wang

19 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Transformer, Diffusion, Gauge Symmetry, Fisher-Rao curvature, Bispectral invariants, Multi-head attention

TL;DR: Curvature and bispectrum correspond quantitatively: Fisher-Rao curvature lower-bounds bispectral energy in multi-head attention with 98.9% validity.

Abstract: Transformers contain substantial parameter redundancies: many weight settings compute the same function. Characterizing these equivalences is key for model comparison and optimization. We prove a quantitative correspondence linking differential-geometric and harmonic-analytic invariants for neural network symmetries. We prove that Fisher-Rao curvature on the parameter-to-function quotient for multi-head attention provides a lower bound for permutation-bispectral energy in a linearized regime, revealing these two invariants as complementary aspects of the same underlying structure. Empirical validation across model scales from 4 to 24 heads demonstrates 98.9% validity of the theoretical bound, with the correspondence persisting through 10,000 training steps. By bridging differential geometry and harmonic analysis, we provide both theoretical insight into Transformer symmetries and a practical geometric framework for identifying functionally equivalent models. We report correspondence in native units, with curvature as a squared Frobenius norm throughout.

Primary Area: learning theory

Submission Number: 19082

Loading