On the Measurement and Efficient Mitigation of Length Generalization Gaps in Large Language Models

ICLR 2026 Conference Submission15652 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, length generalization
TL;DR: Theoretical analysis, implications and inspired solution to length generalization
Abstract: Large Language Models (LLMs) typically train on short text due to the quadratic complexity of their self-attention architectures. As a result, their performance suffers drastically on inputs longer than those encountered during training, substantially limiting their applications in real-world tasks involving long contexts. In this paper, we rigorously establish an upper bound on length generalization in the measurement space and identify two length-related factors that limit performance. Our theory explains two recent observations: **_(i)_** out-of-distribution positions in longer contexts reduce length generalization, and **_(ii)_** fine-tuning on entire sequences is not necessary. Motivated by these insights, we propose _Virtual-context Learning_ (_VCL_), a flexible method that requires minimal modifications to most fine-tuning approaches. Experiments on various tasks show that _VCL_ allows LLMs to generalize to 4 $\times$ context windows while retaining perplexity and improving performance on downstream tasks such as Passkey Retrieval and LongBench. _VCL_ brings substantial efficiency improvements, reducing decoding time and memory usage by up to 50\% compared with fine-tuning baselines.
Primary Area: interpretability and explainable AI
Submission Number: 15652
Loading