Gauge Fiber Bundle Geometry of Transformers

Published: 23 Sept 2025, Last Modified: 27 Nov 2025NeurReps 2025 ProceedingsEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Riemannian Geometry, Gauge Symmetry, Fiber Bundle, Principal Bundle, Gauge Orbit, Transformers
TL;DR: Transformer attention admits principal fiber bundle structure with 98k-dim gauge fibers. Fisher metric induces vertical/horizontal decomposition: gradients lie entirely in horizontal subspace, explaining optimization in quotient space.
Abstract: We give a geometry-first account of Transformers with GeLU. Building on a companion NeurReps paper that completely characterizes the head-wise gauge symmetries of multi-head attention, we treat the maximal head-wise symmetry group as given and study the induced geometry on the resulting quotient of functionally distinct models. On a generic regular set of parameters, this symmetry group acts freely and properly, so the parameter space fibers over a quotient manifold with gauge orbits as fibers. We establish an Ehresmann connection using the ambient Euclidean metric, which resolves the degeneracy of the Fisher–Rao (FR) metric along gauge directions. This framework clarifies that the natural gradient is the horizontal Riesz representative of the Euclidean gradient with respect to the FR geometry on the quotient. We show the connection has generically nonzero curvature, implying path-dependent holonomy in parameter updates. We also clarify the roles of the Attention (MHA) and FFN blocks: while MHA parameters possess gauge symmetry, FFN gradients are strictly horizontal as the FFN parameters are invariant under the MHA gauge group. We turn these ideas into practical diagnostics—a gauge-aware gradient split and a small-loop holonomy estimator—and report consistency checks aligning with the theory. Architectural choices such as RoPE appear as principled gauge reductions (e.g., per-head Q/K dimension from dₖ² to dₖ).
Submission Number: 15
Loading