Keywords: Efficient Transformers, Positional Encoding, Long-Context Modeling, Structured Sparsity, Length Extrapolation, Attention Mechanism
Abstract: Positional encodings are fundamental to Transformers, yet explicit methods like RoPE often incur high computational overhead and struggle with length extrapolation.
In this paper, we propose \textbf{Sco}ped \textbf{P}osition \textbf{E}ncoding (\textbf{ScoPE}), a novel framework that reimagines structured sparsity as an intrinsic position encoding mechanism.
Instead of relying on explicit arithmetic signals, ScoPE assigns exponentially distributed look-back scopes to attention heads.
We theoretically demonstrate that this simple topological constraint transforms the model into a hierarchical processor, inducing exponential Order Awareness (OA) with network depth.
Consequently, ScoPE is parameter-free and avoids the resolution decay typical of explicit methods.
Empirically, it significantly enhances efficiency by masking the majority of attention computations—offering a theoretical $8\times$ reduction in FLOPs.
Extensive evaluations on LLaMA-3-8B architectures reveal that ScoPE achieves superior native length extrapolation and robust retrieval fidelity compared to RoPE, all while substantially reducing training and inference latency.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: Language Modeling, Efficient/Low-Resource Methods for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches low compute settings-efficiency, Theory
Languages Studied: English
Submission Number: 10843
Loading