Keywords: Data Augmentations, Length Generalization, Transformers
TL;DR: We propose a data augmentation strategy that enables arithmetic length generalization via inserting synchronized blankspace.
Abstract: Transformers often struggle with achieving length generalization on algorithmic tasks. To date, the most successful techniques attempting to achieve length generalization impose modifications to the model architecture. Instead, we propose Aligned Blankspace Augmentation (ABA), a simple data augmentation method that zero-pads numbers and inserts synchronized blank spaces across operands, demonstrating that the original Transformer architecture can achieve length generalization. Experiments demonstrate that small Transformers trained on up to 20-digit addition using our method achieve high accuracy on 200-digit problems, significantly outperforming prior works. The approach also enhances performance on other tasks like sorting and multi-operand addition, and improves multiplication generalization with scratchpads.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 6350
Loading