On the Existence of Universal Simulators of Attention

On the Existence of Universal Simulators of Attention

ICLR 2026 Conference Submission13509 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Transformers, Self-attention, Expressivity, RASP

Abstract: Previous work on the learnability of transformers — focused on examining their ability to approximate specific algorithmic patterns through training — has largely been data-driven, offering only probabilistic rather than deterministic guarantees. Expressivity, on the contrary, has theoretically been explored to address the problems _computable_ by such architecture. These results proved the Turing-completeness of transformers, investigated bounds focused on circuit complexity, and formal logic. Being at the crossroad between learnability and expressivity, the question remains: _can transformer architectures exactly simulate an arbitrary attention mechanism, or in particular, the underlying operations?_ In this study, we investigate the transformer encoder's ability to simulate a vanilla attention mechanism. By constructing a universal simulator $\mathcal{U}$ composed of transformer encoders, we present algorithmic solutions to replicate attention outputs and the underlying elementary matrix and activation operations via RASP, a formal framework for transformer computation. We show the existence of an algorithmically achievable, data-agnostic solution, previously known to be approximated only by learning.

Primary Area: learning theory

Submission Number: 13509

Loading