Bispectral Invariants for Transformers: An Operator-Algebraic Approach

NeurIPS 2025 Workshop NeurReps Submission17 Authors

20 Aug 2025 (modified: 29 Oct 2025)Submitted to NeurReps 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bispectral Invariant, Operator Algebra, Gauge Symmetry, Gauge Group, Harmonic Analysis
TL;DR: Operator algebras reveal Transformer gauge symmetry (98K+ redundant dims/layer). Bispectral invariants computationally detect this Morita equivalence, identifying functionally identical models.
Abstract: Modern Transformer models exhibit massive parameter redundancy, with millions of distinct configurations yielding identical functions. We provide the first complete characterization of this phenomenon through the maximal gauge group Gₘₐₓ = ((GL(dₖ))ʰ × (GL(dᵥ))ʰ) ⋊ Sₕ. Our approach combines operator-algebraic methods with harmonic analysis to develop complete computational invariants for representation equivalence. We formalize Transformer layers as modules over C*-algebras, enabling rigorous analysis via Morita theory and Fredholm indices. The centerpiece of our framework is the G-bispectrum, which, after canonical gauge-fixing to eliminate continuous degrees of freedom, provides complete invariants for the residual permutation symmetry Sₕ. We introduce a selective variant achieving O(h) complexity for permutation group discrimination after canonicalization. Comprehensive experiments validate our theory: gauge transformations preserve outputs to machine precision (relative error bounded by 15εₘₐcₕ), the bispectrum achieves 100% discrimination between non-equivalent canonicalized models, and the selective variant provides 42.7× speedup for permutation group analysis. These results establish foundational tools for model comparison, optimization analysis, and understanding the true complexity of Transformer representations.
Submission Number: 17
Loading