Keywords: generative modeling, denoising diffusion, consistency model, image generation
Abstract: Recent advances in continuous generative models, encompassing multi-step processes such as diffusion and flow matching (typically requiring $8$-$1000$ steps) and few-step methods such as consistency models (typically $1$-$8$ steps), have yielded impressive generative performance.
However, existing work often treats these approaches as distinct paradigms, leading to disparate training and sampling methodologies.
We propose a unified framework for the training, sampling, and analysis of diffusion, flow matching, and consistency models.
Within this framework, we derive a surrogate unified objective that, for the first time, theoretically shows that the few-step objective can be viewed as the multi-step objective plus a regularization term.
Building on this framework, we introduce the **U**nified **C**ontinuous **G**enerative **M**odels **T**rainer and **S**ampler (**UCGM**), which enables efficient and stable training of both multi-step and few-step models.
Empirically, our framework achieves state-of-the-art results.
On ImageNet $256\times256$ with a $675\text{M}$ diffusion transformer, UCGM-T trains a multi-step model achieving $1.30$ FID in $20$ steps, and a few-step model achieving $1.42$ FID in only $2$ steps.
Moreover, applying UCGM-S to REPA-E improves its FID from $1.26$ (at $250$ steps) to $1.06$ in only $40$ steps, without additional cost.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 24943
Loading