Emergent properties with repeated examples

Francois Charton; Julia Kempe

Emergent properties with repeated examples

Francois Charton, Julia Kempe

Published: 10 Oct 2024, Last Modified: 09 Nov 2024SciForDL PosterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: On three controlled experiments with generated data we show that models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples and introduce two-set training to show the benefits of repetition..

Abstract: We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets. On three problems of mathematics: the greatest common divisor, modular multiplication, and matrix eigenvalues, we show that for a fixed number of training steps, models trained on smaller sets of repeated examples outperform models trained on larger sets of single-use examples. We also demonstrate that {\em two-set training} - repeated use of a small random subset of examples, along normal sampling on the rest of the training set - provides for faster learning and better performance. This highlights that the benefits of repetition can outweigh those of data diversity. These datasets and problems provide a controlled setting to shed light on the still poorly understood interplay between generalization and memorization in deep learning.

Style Files: I have used the style files.

Debunking Challenge: This submission is an entry to the debunking challenge.

Submission Number: 40

Loading