The Geometrical and Topological Signature of Transformers

Asif Khan

The Geometrical and Topological Signature of Transformers

Asif Khan

Published: 02 Mar 2026, Last Modified: 11 Mar 2026ICLR 2026 Workshop GRaM PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: tiny paper (up to 4 pages)

Keywords: Transformers; Geometry of representations; Persistent homology

Abstract: We propose a topological framework to analyze the layerwise evolution of transformer representations by modeling attention heads as Markov kernels on a token metric space. This formulation admits a Wasserstein-1 ($W_1$) lifting where coarse Ollivier-Ricci curvature provides quantitative bounds on the action of the induced operator. A positive curvature implies layerwise Wasserstein contraction while negative implies expansion. To connect these statements to practice, we introduce a reproducible probe that estimates robust curvature lower quantiles, directly tests contraction on random measures in $W_1$, and tracks layerwise topological simplification using persistent homology on diffusion-induced distances. In pretrained GPT-2 and GPT-2-medium models, we observe a depthwise transition toward more contractive support, with shrinking ($H_1$) lifetimes and persistence of a coarse ($H_0$) skeleton.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 86

Loading