Embarrassingly Simple Dataset Distillation

Published: 28 Oct 2023, Last Modified: 30 Nov 2023WANT@NeurIPS 2023 PosterEveryoneRevisionsBibTeX
Keywords: Dataset Distillation, Data Condensation
TL;DR: We propose a new simple method for dataset distillation to achieve state-of-art results on a vast majority of benchmarks.
Abstract: Training of large-scale models in general requires enormous amounts of traning data. Dataset distillation aims to extract a small set of synthetic training samples from a large dataset with the goal of achieving competitive performance on test data when trained on this sample, thus reducing both dataset size and training time. In this work, we tackle dataset distillation at its core by treating it directly as a bilevel optimization problem. Re-examining the foundational back-propagation through time method, we study the pronounced variance in the gradients, computational burden, and long-term dependencies. We introduce an improved method: Random Truncated Backpropagation Through Time (RaT-BPTT) to address them. RaT-BPTT incorporates a truncation coupled with a random window, effectively stabilizing the gradients and speeding up the optimization while covering long dependencies. This allows us to establish new dataset distillation state-of-the-art for a variety of standard dataset benchmarks.
Submission Number: 30
Loading