Repeated examples help learn arithmetic

Published: 10 Oct 2024, Last Modified: 31 Oct 2024MATH-AI 24EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Transformers, arithmetic, learning
TL;DR: On two arithmetic tasks, GCD and modular multiplication, models trained on small sets of repeated examples outperform models trained from larger, single use, sets
Abstract: We study small transformers trained on two problems of arithmetic: the greatest common divisor (GCD) and modular multiplication, and show that models trained on a limited set of repeated examples achieve better performance than models trained from unlimited data. In fact, modular multiplication is only learned on small training sets. We also demonstrate that two-set training - repeated use of a small random subset of examples, along normal sampling on the rest of the training set - provides for faster learning and better performance. These experiments highlight that the benefits of repetition can outweigh those of data diversity; and shed light on the still poorly understood interplay between generalization and memorization in deep learning.
Concurrent Submissions: Longer version submitted to ICLR 2025
Submission Number: 77
Loading