Emergent properties with repeated examples

Published: 10 Oct 2024, Last Modified: 09 Nov 2024SciForDL PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: On three controlled experiments with generated data we show that models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples and introduce two-set training to show the benefits of repetition..
Abstract: We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets. On three problems of mathematics: the greatest common divisor, modular multiplication, and matrix eigenvalues, we show that for a fixed number of training steps, models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples. We also demonstrate that {\em two-set training} - repeated use of a small random subset of examples, along normal sampling on the rest of the training set - provides for faster learning and better performance. This highlights that the benefits of repetition can outweigh those of data diversity. These datasets and problems provide a controlled setting to shed light on the still poorly understood interplay between generalization and memorization in deep learning.
Style Files: I have used the style files.
Debunking Challenge: This submission is an entry to the debunking challenge.
Submission Number: 40
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview