Selective Rotary Position Embedding

Selective Rotary Position Embedding

ICLR 2026 Conference Submission21436 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: RoPE, Linear Transformer, Attention, State Space Models, Forget Gate

TL;DR: We introduce Selective RoPE, an input-dependent rotary embedding that enhances gated linear transformers.

Abstract: Position information is essential for language modeling. In softmax transformers, Rotary Position Embeddings (RoPE) encode positions through *fixed-angle* rotations, while in linear transformers, order is handled via input-dependent (selective) gating that decays historical information. Selectivity has generally been shown to improve language related tasks. Inspired by this, we introduce **Selective RoPE**, an *input-dependent* rotary embedding mechanism, that generalizes *RoPE*, and enables rotation in all angels for linear transformers. We show that softmax attention already performs a hidden form of these rotations on query-key pairs, uncovering an implicit positional structure. We further show that in state-space models and gated linear transformers, the real part manages forgetting while the imaginary part encodes positions through rotations. We validate our method by equipping gated linear attention (GLA) with **Selective RoPE**, demonstrating that its input-dependent rotations improve performance in language modeling and on difficult sequence tasks like copying, state tracking, and retrieval.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 21436

Loading