Quantitative Bounds for Length Generalization in Transformers

Published: 10 Jun 2025, Last Modified: 15 Jul 2025MOSS@ICML2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: transformers, LLM theory, length generalization
TL;DR: We provide quantitative bounds on the length of sequences required to be observed during training for a transformer to length generalize, e.g., to continue to perform well on sequences unseen during training.
Abstract: We provide quantitative bounds on the length of sequences required to be observed during training for a transformer to length generalize, e.g., to continue to perform well on sequences unseen during training. Our results improve on Huang et al. (2024), who show that there is a finite training length beyond which length generalization is guaranteed, but for which they do not provide quantitative bounds.
Code: zip
Submission Number: 17
Loading