Accelerating Transformer Training: Architectural Symmetry, Positional Encoding, and Teleportation

14 Sept 2025 (modified: 08 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: transformer, functional equivalence
TL;DR: This work present a systematic study of teleportation in Transformer-based models
Abstract: As neural architectures continue to grow in complexity and scale, the development of advanced optimization techniques has become increasingly important. Teleportation has recently emerged as a principled approach for accelerating the convergence of gradient descent-based algorithms by traversing loss-invariant level sets to identify parameterizations with favorable geometric properties. Although prior teleportation methods have achieved notable success in feedforward and convolutional networks, extending these techniques to Transformer architectures presents unique challenges. In particular, existing approaches typically assume the symmetry structure of vanilla attention, overlooking the critical role of positional encodings, which fundamentally reshape architectural symmetries and render earlier analyses inapplicable. To address this gap, we present a systematic study of teleportation in Transformer-based models. We first characterize how the architectural symmetry of multihead attention is modified under two widely used positional encoding schemes--sinusoidal and rotary--and provide a comprehensive description of the resulting symmetry groups. Guided by these insights, we introduce a teleportation framework tailored to Transformers and evaluate its effectiveness across diverse configurations, datasets, and modalities. Our results demonstrate the versatility of teleportation, elucidate the interplay between positional encoding and architectural symmetry in Transformer optimization, and establish a foundation for the principled development of teleportation algorithms that fully exploit the symmetry structure of Transformer architectures.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 5098
Loading