Accelerating Diffusion Model with Dynamic Alignment

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion, Class-conditional Image Generation
Abstract: Recent studies have shown improvements in both generation quality and training efficiency by constraining representations during the denoising process of generative diffusion models. While distilling simple visual representations is effective, it can lead to over-alignment issues. When the model achieves alignment early in training, these simple representations can become hindrance to training the generative capacity. Building upon prior efforts that addressed this problem from the perspectives of alignment objectives and training strategies, we introduce DyA. First, we incorporate richer alignment materials to address the problem of overly simplistic representations at the source. Second, we use the internal denoising time of the diffusion model as an indicator variable to dynamically adjust the constraint strength of different levels of information. Finally, we employ the Stochastic Dropout Strategy (SDS), which allows the model to emphasize generative capacity training while providing guidance throughout the entire process. Experiments have shown that this approach improves both generation quality and training efficiency. The DyA accelerates SiT training by approximately 20 times, achieving performance comparable to SiT-XL model trained for 7M steps in just around 350K steps.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 8142
Loading