Keywords: length generalization, theory
TL;DR: A new framework for analyzing and proving length generalization bounds.
Abstract: _Length generalization_, the ability of sequence models to
generalize to sequences longer than those encountered during
training, remains a key challenge for transformers,
especially in tasks requiring algorithmic reasoning. Existing
theoretical understanding of length generalization is limited, often providing
only asymptotic results or focusing on specific problem classes or
architectural variants, while empirical approaches frequently rely on
ad hoc and often fragile techniques.
In this work we introduce a novel framework for analyzing and
proving length generalization bounds under specified, verifiable assumptions. A key
outcome of the theory is the identification of a natural set of
_auxiliary_ tasks, intricately related to the primary task structure,
such that strong performance on these auxiliary tasks, alongside the
primary task, provably guarantees length generalization within the framework. This
motivates a multi-task training procedure that explicitly optimizes
performance on both the primary and the identified auxiliary tasks.
Empirical evaluations on a variety of synthetic benchmarks
known to be challenging for length generalization, including sequence
sorting, and reversal, demonstrate that our proposed method yields
significant improvements in generalization to substantially longer
sequences.
Supplementary Material: zip
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 24918
Loading