Keywords: LLM, long context, RoPE
Abstract: Extending the context length of large language models (LLMs) remains challenging, especially when models are expected to preserve reasoning performance as sequence length increases. Many existing methods extend context by modifying rotary positional embeddings (RoPE). However, these approaches typically impose the same positional treatment across all layers and do not account for the hierarchical nature of representation formation inside the model.
We present an Anchor-and-Reason view of long-context processing that emphasizes layer-wise functional differences. Specifically, we posit two regimes. In earlier layers, the model primarily performs an anchoring operation: accurate and sufficiently strong positional signals help organize long input sequences and support the formation of local semantic representations. In later layers, the model increasingly shifts to reasoning: it integrates intermediate representations to support global composition and deduction, where overly rigid positional constraints can become limiting.
Based on this perspective, we propose Layer-Scaling for Position (LASP), a simple layer-dependent adjustment of positional strength. LASP maintains higher-frequency positional components in shallow layers to stabilize sequence-to-semantics mapping, while progressively reducing positional intensity in deeper layers via exponential decay, allowing higher layers to operate with fewer positional restrictions. Experiments on a range of long-context benchmarks show that LASP yields consistent improvements over strong baselines.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: continual learning,fine-tuning
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 6287
Loading