Keywords: Rotary Position Embedding, Position Interpolation, Extrapolation, Large Language Model
TL;DR: We show that the location of the dominant frequency band is governed jointly by the base and the training sequence length.
Abstract: Rotary Position Embeddings (RoPE) are widely adopted in LLMs, and it is commonly believed that larger base frequencies $\theta$ yield better long-context performance.
In this paper, we show that a high-norm RoPE dimension, referred to as the “frequency band,” consistently emerges across multiple models, and we focus on this band to reveal the trade-offs inherent in RoPE.
We find that replacing the RoPE dimensions below the frequency band with NoPE during inference has little effect on performance, indicating that these lower-frequency dimensions are only weakly utilized.
We further find that the location of the frequency band depends on the RoPE base $\theta$ and the training sequence length. Moreover, the band forms early during pre-training and persists even after context extension via position interpolation.
Notably, we show that aligning $\theta$ with the training length shifts the band toward lower frequencies and improves extrapolation, whereas increasing $\theta$ enhances interpolation but reduces extrapolation, revealing a clear trade-off between interpolation and extrapolation.
We believe this work is a step toward a sharper understanding of positional embeddings in LLMs, with falsifiable diagnostics and practical guidance for choosing $\theta$ that support scaling to longer contexts.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 17064
Loading