Data Augmentations for Arithmetic Length Generalization in Small Transformers

Lynnix Zou; Muhammad H. Ashiq; Grigorios Chrysos

Data Augmentations for Arithmetic Length Generalization in Small Transformers

Lynnix Zou, Muhammad H. Ashiq, Grigorios Chrysos

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Data Augmentations, Length Generalization, Transformers

TL;DR: We propose a data augmentation strategy that enables arithmetic length generalization via inserting synchronized blankspace.

Abstract: Transformers often struggle with achieving length generalization on algorithmic tasks. To date, the most successful techniques attempting to achieve length generalization impose modifications to the model architecture. Instead, we propose Aligned Blankspace Augmentation (ABA), a simple data augmentation method that zero-pads numbers and inserts synchronized blank spaces across operands, demonstrating that the original Transformer architecture can achieve length generalization. Experiments demonstrate that small Transformers trained on up to 20-digit addition using our method achieve high accuracy on 200-digit problems, significantly outperforming prior works. The approach also enhances performance on other tasks like sorting and multi-operand addition, and improves multiplication generalization with scratchpads.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 6350

Loading