Diffusion Adversarial Post-Training for One-Step Video Generation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Convert a slow diffusion model into a one-step, real-time, high-resolution video generator through adversarial post-training
Abstract: The diffusion models are widely used for image and video generation, but their iterative generation process is slow and expansive. While existing distillation approaches have demonstrated the potential for one-step generation in the image domain, they still suffer from significant quality degradation. In this work, we propose Adversarial Post-Training (APT) against real data following diffusion pre-training for one-step video generation. To improve the training stability and quality, we introduce several improvements to the model architecture and training procedures, along with an approximated R1 regularization objective. Empirically, our experiments show that our adversarial post-trained model can generate two-second, 1280x720, 24fps videos in real-time using a single forward evaluation step. Additionally, our model is capable of generating 1024px images in a single step, achieving quality comparable to state-of-the-art methods.
Lay Summary: We propose a technique to convert a slow video diffusion model into a one-step generator. Our model can generate 1280x720 24fps videos using only a single step. It can run in real time on 8xh100 GPUs.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: video generation, image generation, diffusion, adversarial training, GAN, real-time, fast, distillation, generative model
Submission Number: 5127
Loading