The Expressivity Limits of Transformers

Published: 14 Feb 2026, Last Modified: 14 Feb 2026MATH4AI @ AAAI 2026 OralEveryoneRevisionsCC BY 4.0
Keywords: transformer expressivity, accessible sequences, embedding space geometry, mean-field transformers, approximation theory, finite precision
TL;DR: We prove that transformer expressivity is fundamentally limited: the maximal length of accessible sequences grows linearly with prompt size, and beyond a critical point, accessibility decays exponentially.
Abstract: We study the fundamental expressivity limits of transformer models. We formalize the notion of accessible sequences—those that a transformer can produce for some prompt—and characterize how accessibility depends on prompt length and model precision. By partitioning the embedding space via the decoder readout into next-token argmax regions and extending transformers to a mean-field map on probability measures, we derive theoretical upper bounds on the number and length of accessible output sequences. We prove that (i) the maximal length of accessible sequences grows linearly with the prompt length, and (ii) beyond a critical threshold, the proportion of reachable sequences decays exponentially with sequence length. These bounds hold even with unbounded context and computation time, linking the expressivity limits of transformers to the geometry of their embedding space and the finiteness of their representational precision. Experiments using a “cramming” procedure confirm both the linear scaling and the post-threshold exponential decay.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 36
Loading