Dataset Condensation with Distribution Matching

Bo Zhao; Hakan Bilen

Dataset Condensation with Distribution Matching

Bo Zhao, Hakan Bilen

29 Sept 2021 (modified: 04 May 2025)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Dataset Condensation, Data-efficient Learning, Distribution Matching, Continual Learning, Neural Architecture Search

Abstract: Computational cost to train state-of-the-art deep models in many learning problems is rapidly increasing due to more sophisticated models and larger datasets. A recent promising direction to reduce training time is dataset condensation that aims to replace the original large training set with a significantly smaller learned synthetic set while preserving its information. While training deep models on the small set of condensed images can be extremely fast, their synthesis remains computationally expensive due to the complex bi-level optimization and second-order derivative computation. In this work, we propose a simple yet effective dataset condensation technique that requires significantly lower training cost with comparable performance by matching feature distributions of the synthetic and original training images in sampled embedding spaces. Thanks to its efficiency, we apply our method to more realistic and larger datasets with sophisticated neural architectures and achieve a significant performance boost while using larger synthetic training set. We also show various practical benefits of our method in continual learning and neural architecture search.

One-sentence Summary: A simple dataset condensation method without bi-level optimization can achieve sate-of-the-art generalization performance.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/dataset-condensation-with-distribution/code)

6 Replies

Loading