D^3: Distributional Dataset Distillation with Latent Priors

Tian Qin; David Alvarez-Melis

D^3: Distributional Dataset Distillation with Latent Priors

Tian Qin, David Alvarez-Melis

21 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Dataset Distillation, Dataset Compression, Latent Variable Models, Generative Models

TL;DR: We use Deep Latent Variable Models to distill large datasets into distributions.

Abstract: Dataset distillation, the process of condensing a dataset into a smaller synthetic version while retaining downstream predictive performance, has gained traction in diverse machine learning applications, including neural architecture search, privacy-preserving learning and continual learning. Existing methods face challenges in scaling efficiently beyond toy datasets. They also suffer from diminishing returns when increasing the distilled dataset size. We present Distributional Data Distillation (D$^3$), a novel approach that reframes data distillation problem into a distributional one. In contrast to existing methods that distill a dataset into a finite set of real or synthetic examples, D$^3$ produces a probability distribution and a decoder from which the original dataset can be approximately regenerated. We use Deep Latent Variable Models (DLVMs) to parametrize the condensed data distribution and introduce a new training objective that combines a trajectory-matching distillation loss with a distributional discrepancy term, such as Maximum Mean Discrepancy, to encourage alignment between original and distilled distributions. Experimental results across various computer vision datasets show that our method effectively distills with minimal performance degradation. Even for large high-resolution datasets like ImageNet, our method consistently outperforms sample-based distillation methods.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4136

Loading