A Theoretical Study of Dataset Distillation

Published: 07 Nov 2023, Last Modified: 13 Dec 2023M3L 2023 PosterEveryoneRevisionsBibTeX
Keywords: dataset distillation, data compression, theory
TL;DR: We study dataset distillation (a form of dataset compression) from a theoretical perspective, providing algorithms when DD is possible and impossibility results when it is not.
Abstract: Modern machine learning models are often trained using massive amounts of data. Such large datasets come at a high cost in terms of both storage and computation, especially when the data will need to be used repeatedly (e.g., for neural architecture search or continual learning). _Dataset distillation_ (DD) describes the process of constructing a smaller ``distilled'' dataset (usually consisting of synthetic examples), such that models trained on the distilled dataset will be similar to models trained on the original dataset. In this paper, we study DD from a theoretical perspective. We show that for generalized linear models, it is possible to construct a distilled dataset with only a _single point_ which will exactly recover the model trained on the original dataset, regardless of the original number of points. We provide a specialized distillation for linear regression with size independent of the original number of points, but which perfectly reconstructs the model obtained from the original dataset with _any_ data-independent regularizer, or by combining the original dataset with any additional data. We also provide impossibility results showing that similar constructions are impossible for logistic regression, and that DD cannot be accomplished in general for kernel regression, even if the goal is only to recover a single model.
Submission Number: 46