Gating Enables Curvature: A Geometric Expressivity Gap in Attention

30 Apr 2026 (modified: 09 May 2026)ICML 2026 Workshop CoLoRAI SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Attention, Multiplicative Gating, Fisher–Rao Geometry, Representation Geometry, Curvature, Geometric Expressivity
TL;DR: Multiplicative gating breaks the affine structure of ungated attention, enabling non-flat Fisher–Rao representation geometries with curvature that can amplify under depth and improve nonlinear task performance.
Abstract: Multiplicative gating is widely used in neural architectures, but its use in attention is recent and its geometric role remains unclear. We model attention outputs as Gaussian means and study their Fisher Rao geometry. At the operator level, ungated attention induces flat manifolds through affine value mixing. Gating enables curved geometries, including positive curvature. This reveals a geometric expressivity gap. Furthermore, we identify a structured regime where curvature accumulates under composition, leading to a systematic amplification effect with depth. Empirically, gated models show higher curvature and perform better on nonlinear tasks, with no consistent gains on linear ones.
Submission Number: 19
Loading