Diffusion Dataset Condensation: Training Your Diffusion Model Faster with Less Data

Diffusion Dataset Condensation: Training Your Diffusion Model Faster with Less Data

ICLR 2026 Conference Submission12757 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: dataset condensation, diffusion model, image generation, interval sampling, double conditional embedding, visual information injection

TL;DR: We propose the first dedicated framework for diffusion dataset condensation: training diffusion models significantly faster and with dramatically fewer data, while retaining high output quality.

Abstract: Diffusion models have achieved remarkable success in various generative tasks, but training them remains highly resource-intensive, often with millions of images and GPU days of computation required. From a data-centric perspective addressing the limitation, we study diffusion dataset condensation as a new challenging problem setting that aims at constructing a "synthetic" sub-dataset with significantly fewer samples than the original dataset for training high-quality diffusion models significantly faster. To the best of our knowledge, we are the first to formally study the dataset condensation task for diffusion models, while conventional dataset condensation focused on training discriminative models. For this new challenge, we further propose a novel $D$iffusion $D$ataset $C$ondensation ($D^{2}C$ ) framework, that consists of two phases: Select and Attach. The Select phase identifies a compact and diverse subset via a diffusion difficulty score and interval sampling, upon which the Attach phase enhances conditional signals and information of the selected subset by attaching rich semantic and visual representations. Extensive experiments across dataset sizes, model architectures, and resolutions demonstrate that our $D^{2}C$ can train diffusion models significantly faster with dramatically fewer data while retaining high visual quality. Notably, for the SiT-XL/2 architecture, our $D^{2}C$ achieves a $100 \times x$ acceleration, reaching a FID of 4.3 in just 40k steps using only 0.8% of the training data.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 12757

Loading