Transformers Are Optimal Effective Fields

Published: 31 Oct 2025, Last Modified: 28 Nov 2025EurIPS 2025 Workshop PriGMEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Transformer Architecture, Variational Calculus, Theoretical Mechanics
TL;DR: A proof of the optimality of the Transformer architecture from variational principles.
Abstract: Are representations in Transformers provably optimal? We present an axiomatic theory of the Transformer architecture. First, we show that a complex-valued Transformer with linear attention and linear feed-forward residual blocks is uniquely determined by a potential field governed by leading linear and interactive terms. As practical extensions of the theory, we characterize ReLU/conic/gated MLP and softmax/sparse attention via axiomatic constructions. The implications include a non-exhaustive unification of existing Transformer variants within a single formalism, and a principled foundation for future architecture search.
Submission Number: 22
Loading