On the Measurement and Efficient Mitigation of Length Generalization Gaps in Large Language Models

On the Measurement and Efficient Mitigation of Length Generalization Gaps in Large Language Models

ICLR 2026 Conference Submission15652 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models, length generalization

TL;DR: Theoretical analysis, implications and inspired solution to length generalization

Abstract: Large Language Models (LLMs) typically train on short text due to the quadratic complexity of their self-attention architectures. As a result, their performance suffers drastically on inputs longer than those encountered during training, substantially limiting their applications in real-world tasks involving long contexts. In this paper, we rigorously establish an upper bound on length generalization in the measurement space and identify two length-related factors that limit performance. Our theory explains two recent observations: **_(i)_** out-of-distribution positions in longer contexts reduce length generalization, and **_(ii)_** fine-tuning on entire sequences is not necessary. Motivated by these insights, we propose _Virtual-context Learning_ (_VCL_), a flexible method that requires minimal modifications to most fine-tuning approaches. Experiments on various tasks show that _VCL_ allows LLMs to generalize to 4 $\times$ context windows while retaining perplexity and improving performance on downstream tasks such as Passkey Retrieval and LongBench. _VCL_ brings substantial efficiency improvements, reducing decoding time and memory usage by up to 50\% compared with fine-tuning baselines.

Primary Area: interpretability and explainable AI

Submission Number: 15652

Loading