Length Generalization with Log-Depth Recurrent Units

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: length generalization, log-depth recurrent unit, ldru, regular languages, reduction, automata theory, monoids
Abstract: Length generalization remains a persistent challenge for neural networks: recurrent models tend to suffer from positional biases, while Transformers are constrained by fixed computational depth. Regular languages provide a frequently used testbed for evaluating length generalization, as any sequence can be exactly verified to determine its label. We propose the Log-Depth Recurrent Unit (LDRU), which composes token embeddings through a learned pairwise operator inspired by monoid composition, yielding uniform logarithmic depth across tokens. On 21 regular tasks, consisting of standard benchmarks and new prefix languages, the LDRU achieves 100\% out-of-distribution accuracy on 18 tasks and at least 96\% on the remaining 3, consistently outperforming recurrent and attention-based models. These results establish the LDRU as an effective architecture for length generalization on regular languages and a promising direction for compositional sequence modeling.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 11250
Loading