Group Representational Position Embedding

Group Representational Position Embedding

ICLR 2026 Conference Submission20573 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: position encoding, group theory

Abstract: We present GRAPE (Group Representational Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) \emph{multiplicative} rotations (Multiplicative GRAPE) in $\mathrm{SO}(d)$ and (ii) \emph{additive} logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $\mathrm{GL}$. In Mul-GRAPE, a position $n\!\in\!\mathbb{Z}$ (or $t\!\in\!\mathbb{R}$) acts as $\mathbf{G}(n)=\exp(n\,\omega\,\mathbf{L})$ with a rank‑2 skew generator $\mathbf{L}=\mathbf{a}\mathbf{b}^\top{-}\mathbf{b}\mathbf{a}^\top\in\mathfrak{so}(d)$, yielding a relative, compositional, norm‑preserving map with a closed‑form matrix exponential. RoPE is recovered exactly when the $d/2$ planes are the canonical coordinate pairs with log‑uniform spectrum. Learned commuting subspaces and compact non‑commuting mixtures strictly extend this geometry at $O(d)$ and $O(rd)$ cost per head, respectively. In Additive GRAPE, additive logits arise as rank‑1 (or low‑rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long‑context models, subsuming RoPE and ALiBi as special cases.

Primary Area: learning theory

Submission Number: 20573

Loading