Generative Data Augmentation via Diffusion Distillation, Adversarial Alignment, and Importance Reweighting

Ruyi An; Haicheng Huang; Huangjie Zheng; Mingyuan Zhou

Generative Data Augmentation via Diffusion Distillation, Adversarial Alignment, and Importance Reweighting

Ruyi An, Haicheng Huang, Huangjie Zheng, Mingyuan Zhou

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Generative Data Augmentation, Diffusion Model Distillation

Abstract: Generative data augmentation (GDA) leverages generative models to enrich training sets with entirely new samples drawn from the modeled data distribution to achieve performance gains. However, the usage of the mighty contemporary diffusion models in GDA remains impractical: *i)* their thousand-step sampling loop inflates wall-time and energy cost per image augmentation; and *ii)* the divergence between synthetic and real distributions is unknown--classifiers trained on synthetic receive biased gradients. We propose DAR-GDA, a three-stage augmentation pipeline that unites model **D**istillation, **A**dversarial alignment, and importance **R**eweighting that makes diffusion-quality augmentation both fast and optimized for improving downstream learning outcomes. In particular, a teacher diffusion model is compressed into a one-step student via score distillation, slashing the time per-image cost by $>100\times$ while preserving FID. During this distillation (D), the student model additionally undergoes adversarial alignment (A) by receiving direct training signals against real images, supplementing the teacher's guidance to better match the true data distribution. The discriminator from this adversarial process inherently learns to assess the synthetic-to-real data gap. Its calibrated probabilistic outputs are then employed in reweighting (R) by importance weights that quantify the distributional gap and adjust the classification loss when training downstream models; we show that reweighting yields an unbiased stochastic estimator of the real-data risk, fostering training dynamics akin to those of genuine samples. Experiments validate DAR-GDA's synergistic design through progressive accuracy gains with each D-A-R stage. Our approach not only surpasses conventional non-foundation-model GDA baselines but also remarkably matches or exceeds the GDA performance of large, web-pretrained text-to-image models, despite using solely in-domain data. DAR-GDA thus offers diffusion-fidelity GDA samples efficiently, while correcting synthetic-to-real bias to benefit downstream tasks.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 11037

Loading