Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding

Yoav Gelberg; Koshi Eguchi; Takuya Akiba; Edoardo Cetin

Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding

Yoav Gelberg, Koshi Eguchi, Takuya Akiba, Edoardo Cetin

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LMs, Long Context, Positional Embeddings, Architecture

TL;DR: We extend the context of pretrained LMs by dropping their positional embeddings after training.

Abstract: So far, expensive finetuning beyond the pretraining sequence length has been a prerequisite to effectively extend the context of language models (LM). In this work, we break this key bottleneck by ***Dro**pping the **P**ositional **E**mbeddings of LMs after training (DroPE)*. Our simple method is motivated by three key theoretical and empirical observations. First, positional embeddings serve a crucial role during pretraining, providing an important inductive bias that significantly facilitates convergence. Second, over-reliance on this explicit positional information is also precisely what prevents test-time generalization to sequences of unseen length. Third, positional embeddings are not an inherent requirement of effective language modeling and can be safely *removed after pretraining* following a short recalibration phase. Empirically, DroPE yields seamless *zero-shot* context extension *without any long-context finetuning*, quickly adapting pretrained LMs without compromising their capabilities in the original training context. Our findings hold across different models and dataset sizes, far outperforming previous specialized architectures and established rotary position embedding scaling methods.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 1735

Loading