Keywords: position encoding, group theory
Abstract: We present GRAPE (Group Representational Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms:
(i) \emph{multiplicative} rotations (Multiplicative GRAPE) in $\mathrm{SO}(d)$ and
(ii) \emph{additive} logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $\mathrm{GL}$.
In Mul-GRAPE, a position $n\!\in\!\mathbb{Z}$ (or $t\!\in\!\mathbb{R}$) acts as $\mathbf{G}(n)=\exp(n\,\omega\,\mathbf{L})$ with a rank‑2 skew generator $\mathbf{L}=\mathbf{a}\mathbf{b}^\top{-}\mathbf{b}\mathbf{a}^\top\in\mathfrak{so}(d)$, yielding a relative, compositional, norm‑preserving map with a closed‑form matrix exponential. RoPE is recovered exactly when the $d/2$ planes are the canonical coordinate pairs with log‑uniform spectrum. Learned commuting subspaces and compact non‑commuting mixtures strictly extend this geometry at $O(d)$ and $O(rd)$ cost per head, respectively.
In Additive GRAPE, additive logits arise as rank‑1 (or low‑rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long‑context models, subsuming RoPE and ALiBi as special cases.
Primary Area: learning theory
Submission Number: 20573
Loading