Length Generalization with Log-Depth Recurrent Units

Charles Pert; Dalal Alrajeh; Alessandra Russo

Length Generalization with Log-Depth Recurrent Units

Charles Pert, Dalal Alrajeh, Alessandra Russo

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: length generalization, log-depth recurrent unit, ldru, regular languages, reduction, automata theory, monoids

Abstract: Length generalization remains a persistent challenge for neural networks: recurrent models tend to suffer from positional biases, while Transformers are constrained by fixed computational depth. Regular languages provide a frequently used testbed for evaluating length generalization, as any sequence can be exactly verified to determine its label. We propose the Log-Depth Recurrent Unit (LDRU), which composes token embeddings through a learned pairwise operator inspired by monoid composition, yielding uniform logarithmic depth across tokens. On 21 regular tasks, consisting of standard benchmarks and new prefix languages, the LDRU achieves 100\% out-of-distribution accuracy on 18 tasks and at least 96\% on the remaining 3, consistently outperforming recurrent and attention-based models. These results establish the LDRU as an effective architecture for length generalization on regular languages and a promising direction for compositional sequence modeling.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 11250

Loading