Keywords: transformers, LLM theory, length generalization
TL;DR: We provide quantitative bounds on the length of sequences required to be observed during training for a transformer to length generalize.
Abstract: We provide quantitative bounds on the length of sequences required to be observed during training for a transformer to length generalize, e.g., to continue to perform well on sequences unseen during training. Our results improve on Huang et al. (2024), who show that there is a finite training length beyond which length generalization is guaranteed, but for which they do not provide quantitative bounds.
Student Paper: No
Submission Number: 18
Loading