Keywords: LLMs, long-context, length extrapolation
Abstract: The pursuit of long-context modeling capabilities for Large Language Models (LLMs) has made RoPE scaling a prominent line of research aimed at overcoming the inherent limitations of positional encoding extrapolation.Existing studies show that the high-frequency components of RoPE are sensitive to small relative distances and capture local information, while the low-frequency components respond to large relative distances and capture long-range dependencies. This phenomenon has led to the conventional strategy of directly extrapolating the high-frequency components and interpolating the low-frequency ones. However, due to the periodic nature of trigonometric functions, appropriate interpolation of high-frequency components can enhance their ability to capture longer-range dependencies, thereby contributing to improved long-context modeling. Building on these insights, we propose AlphaRoPE, a novel approach for RoPE-based length extrapolation. AlphaRoPE applies interpolation to low-frequency components to resolve out-of-distribution (OOD) issues, while for the high-frequency components, it introduces a carefully calibrated, gradually increasing interpolation factor as frequency descends.This dual approach effectively extends the context length of LLMs without degrading their performance on shorter sequences. Experiments conducted on various models further confirm our hypothesis and demonstrate the superior performance of AlphaRoPE.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 9173
Loading