Keywords: Diffusion Models, Reforcement Learning, Few-step Generation, Generative Model Acceleration
Abstract: Diffusion Models have emerged as a leading class of generative models, yet their iterative sampling process remains computationally expensive.
Timestep distillation is a promising technique to accelerate generation, but it often requires extensive training and leads to a degradation in image quality.
Furthermore, fine-tuning these distilled models to optimize for specific objectives, such as aesthetic appeal or user preference, using Reinforcement Learning (RL) is notoriously unstable, and easily falls into reward hacking.
In this work, we introduce Flash-DMD, a novel framework that enables fast convergence with distillation and stable RL-based refinement for stable optimization.
Specifically, we first propose an efficient timestep-aware distillation strategy that siginicantly reduce training cost with enhanced human preference and realism. Second, and most critically, we introduce a joint training scheme where the model is fine-tuned with an RL objective while the timestep distillation training continues simultaneously. We demonstrate that the stable, well-defined loss from the ongoing distillation acts as a powerful regularizer, effectively stabilizing the RL training process and preventing policy collapse.
Our experiments show that our proposed Flash-DMD not only converges significantly faster but also achieves state-of-the-art generation quality in the 4-step sampling regime, outperforming existing methods in human preference evaluations. Our work presents a robust and effective paradigm for training efficient, high-fidelity, and stable generative models. Codes will be released soon.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 10872
Loading