Revisiting Diffusion Models: From Generative Pre-training to One-Step Generation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Diffusion distillation is a widely used technique to reduce the sampling cost of diffusion models, yet it often requires extensive training, and the student performance tends to be degraded. Recent studies show that incorporating a GAN objective may alleviate these issues, yet the underlying mechanism remains unclear. In this work, we first identify a key limitation of distillation: mismatched step sizes and parameter numbers between the teacher and the student model lead them to converge to different local minima, rendering direct imitation suboptimal. We further demonstrate that a standalone GAN objective, without relying a distillation loss, overcomes this limitation and is sufficient to convert diffusion models into efficient one-step generators. Based on this finding, we propose that diffusion training may be viewed as a form of generative pre-training, equipping models with capabilities that can be unlocked through lightweight GAN fine-tuning. Supporting this view, we create a one-step generation model by fine-tuning a pre-trained model with 85% of parameters frozen, achieving strong performance with only 0.2M images and near-SOTA results with 5M images. We further present a frequency-domain analysis that may explain the one-step generative capability gained in diffusion training. Overall, our work provides a new perspective for diffusion training, highlighting its role as a powerful generative pre-training process, which can be the basis for building efficient one-step generation models.
Lay Summary: Modern generative models, like diffusion models, often require a slow and computationally expensive process to create high-quality images. To make them faster, a common technique called diffusion distillation trains a simpler and faster single-step student model to mimic a larger diffusion model. However, this approach typically requires extensive training and often leads to reduced performance in the student model. In this work, we show that by not strictly training the student model to mimic the teacher and allowing the student model to discover its own solution, we can leverage the knowledge already acquired during diffusion pretraining to achieve fast and effective one-step generation. Our method uses fewer than 1/100 training images than the previous ones and achieves performance near the best models to date. Our research provides a new perspective to understand what diffusion training does and provides an efficient way to create one-step generative models.
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Diffusion Models, One-Step Generators, Generative Models, Image Generation, GAN
Submission Number: 648
Loading