MGDD: A Meta Generator for Fast Dataset Distillation

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 spotlightEveryoneRevisionsBibTeX
Keywords: Dataset Distillation, Dataset Condensation, Efficient Learning, Conditional Generation, Meta Learning
TL;DR: We propose a meta learning algorithm to train a meta generator to generate dataset distillation results efficiently so that the distilled datasets can be generated in a one-stop feed-forward inference after adaptation.
Abstract: Existing dataset distillation (DD) techniques typically rely on iterative strategies to synthesize condensed datasets, where datasets before and after distillation are forward and backward through neural networks a massive number of times. Despite the promising results achieved, the time efficiency of prior approaches is still far from satisfactory. Moreover, when different sizes of synthetic datasets are required, they have to repeat the iterative training procedures, which is highly cumbersome and lacks flexibility. In this paper, different from the time-consuming forward-backward passes, we introduce a generative fashion for dataset distillation with significantly improved efficiency. Specifically, synthetic samples are produced by a generator network conditioned on the initialization of DD, while synthetic labels are obtained by solving a least-squares problem in a feature space. Our theoretical analysis reveals that the errors of synthetic datasets solved in the original space and then processed by any conditional generators are upper-bounded. To find a satisfactory generator efficiently, we propose a meta-learning algorithm, where a meta generator is trained on a large dataset so that only a few steps are required to adapt to a target dataset. The meta generator is termed as MGDD in our approach. Once adapted, it can handle arbitrary sizes of synthetic datasets, even for those unseen during adaptation. Experiments demonstrate that the generator adapted with only a limited number of steps performs on par with those state-of-the-art DD methods and yields $22\times$ acceleration.
Supplementary Material: zip
Submission Number: 5629
Loading