Generalizing Dataset Distillation via Deep Generative Prior

George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, Jun-Yan Zhu

Published: 01 Jan 2023, Last Modified: 15 May 2025CVPR 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Dataset Distillation aims to distill an entire dataset's knowledge into a few synthetic images. The idea is to synthe-size a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data. Despite a recent upsurge of progress in the field, existing dataset dis-tillation methods fail to generalize to new architectures and scale to high-resolution datasets. To overcome the above issues, we propose to use the learned prior from pretrained deep generative models to synthesize the distilled data. To achieve this, we present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space. Our method augments existing techniques, significantly improving cross-architecture generalization in all settings.