Well Begun is Half Done: The Importance of Initialization in Dataset Distillation

Published: 01 Jan 2024, Last Modified: 25 Sept 2025ECCV Workshops (19) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Dataset distillation aims to synthesize small yet informative datasets using deep learning optimization strategies, helping to reduce storage requirements and training costs. Through specific training objectives, models trained on these synthetic datasets can achieve performance comparable to those trained on the original, larger datasets. This technique has successfully condensed several popular datasets and shown significant potential. However, current methods face several challenges. Chief among them is the time-consuming process of generating synthetic images, which can sometimes exceed the time required to train on the original dataset. To address this challenge, we revealed an initial dependency in dataset distillation. We discovered that a well-designed initialization of synthetic data could speed up data generation and improve the quality of training outcomes. Leveraging this insight, we developed a plug-and-play method named Initialization Improved Dataset Distillation (IIDD). This method achieved 1st place in Tiny ImageNet and 2nd place overall in The First Dataset Distillation Challenge at the ECCV 2024 Workshop, demonstrating a significant improvement of +1.15 on CIFAR-100 and +1.77 on Tiny ImageNet compared to baseline.
Loading