How Base Frequency Shapes RoPE: An Analytical Study of Frequency-Band Formation

How Base Frequency Shapes RoPE: An Analytical Study of Frequency-Band Formation

ICLR 2026 Conference Submission17064 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Rotary Position Embedding, Position Interpolation, Extrapolation, Large Language Model

TL;DR: We show that the location of the dominant frequency band is governed jointly by the base and the training sequence length.

Abstract: Rotary Position Embeddings (RoPE) are widely adopted in LLMs, and it is commonly believed that larger base frequencies $\theta$ yield better long-context performance. In this paper, we show that a high-norm RoPE dimension, referred to as the “frequency band,” consistently emerges across multiple models, and we focus on this band to reveal the trade-offs inherent in RoPE. We find that replacing the RoPE dimensions below the frequency band with NoPE during inference has little effect on performance, indicating that these lower-frequency dimensions are only weakly utilized. We further find that the location of the frequency band depends on the RoPE base $\theta$ and the training sequence length. Moreover, the band forms early during pre-training and persists even after context extension via position interpolation. Notably, we show that aligning $\theta$ with the training length shifts the band toward lower frequencies and improves extrapolation, whereas increasing $\theta$ enhances interpolation but reduces extrapolation, revealing a clear trade-off between interpolation and extrapolation. We believe this work is a step toward a sharper understanding of positional embeddings in LLMs, with falsifiable diagnostics and practical guidance for choosing $\theta$ that support scaling to longer contexts.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 17064

Loading