Keywords: Transformer Architecture, Mechanistic Interpretability, Deep Learning Theory, Geometry, Symmetry, Layer Normalization, Self-Attention, Anisotropy, Manifold
TL;DR: a unified geometric framework for the Transformer architecture, re-interpreting its core components as operators that perform a symmetrize-and-structure cycle on a latent manifold.
Abstract: The Transformer architecture has become the foundation of modern artificial intelligence, yet the first-principles reasoning for its design remains surprisingly shallow. Standard explanations—that Layer Normalization (LN) combats "internal covariate shift" or that Multi-Head Attention (MHA) enables parallel processing—are correct but incomplete, failing to explain the deep synergy between the architecture's components. This paper proposes a new, unified framework that reframes these components through a geometric and information-theoretic lens. We argue that the Transformer block operates on a `symmetrize-and-structure` principle. The process begins with LN acting as a **Geometric Stabilizer**, an isotropic operator that projects token representations onto a fixed, (`d_model-2`)-dimensional manifold, ensuring dynamic stability. This symmetric "canvas" is then processed by MHA, which we identify as an **Anisotropic Processor** that performs **Axis-Aligned Subspace Decomposition**. A profound implication of this design is that the network is strongly incentivized to **encode meaning in its vector axes**. This is followed by the Feed-Forward Network (FFN), which we frame as a **Manifold-based Information Filter** that performs a complexity-reducing deformation to select for relevant features. This unified geometric perspective provides a more powerful, first-principles understanding of the Transformer's effectiveness and stability. It culminates in a proposal for a fully trainable, anisotropic LayerNorm, which would make the `symmetrize-and-structure` cycle architecturally consistent and biologically plausible, opening a new frontier in model design.
Submission Number: 46
Loading