Deliberate Practice with Synthetic Data

Published: 10 Oct 2024, Last Modified: 19 Nov 2024AFM 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Synthetic data, Text-to-image models
TL;DR: We propose a new framework to modify text-to-image models for synthetic data generation that generates synthetic examples tailored to a model’s weaknesses during training.
Abstract: Deliberate practice for humans is the process of improving one’s skills by leveraging external feedback while actively seeking out and correcting mistakes. The current status quo in machine learning is to use static datasets, composed of real or generated data, to train models. While state-of-the-art generative models can serve as an infinite source of synthetic data to train downstream models, prior work has shown that simply increasing the dataset size results in diminishing improvements in model accuracy. In this work, we design a framework that generates synthetic data to improve the performance of a downstream machine learning model. The framework incorporates feedback from the downstream model to refine the generated data used to train the model throughout the training process. In particular, we employ deliberate practice for neural network training to generate challenging synthetic examples tailored to the model’s weaknesses at any stage of training, replacing easier, less informative examples in the dataset. With a fixed-size synthetic dataset throughout training, this approach yields over 14% and 8% accuracy improvement on ImageNet-100 and ImageNet-1000, respectively.
Submission Number: 48
Loading