Elucidating Guidance in Variance Exploding Diffusion Models: Fast Convergence and Better Diversity

Published: 03 Mar 2026, Last Modified: 05 Mar 2026ICLR 2026 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Guidance Method for Diffusion Models, Convergence Guarantee, Variance Exploding Diffusion Models
Abstract: While guidance is a standard component in conditional diffusion models, theoretical guarantees have largely focused on Variance-Preserving (VP) models, overlooking state-of-the-art Variance-Exploding (VE) frameworks. In this work, for the first time, we elucidate the influence of guidance for VE models and explain why VE models perform better than VP models in the context of Gaussian mixture models from classification confidence and diversity perspectives. For the classification confidence, we prove the convergence rate for the confidence w.r.t. the strength of guidance $\eta$ for VE models is $1-\eta^{-1}(\log \eta)^2$, which is faster than $1-\eta^{-e^{-T}}(\log \eta)^{2 e^{-T}}$ result for VP models ($T$ is the diffusion time). This result indicates that the VE models have a stronger ability to align with the given condition, which is important for the conditional generation. For the diversity, previous works show that when facing strong guidance, VP models tend to generate extreme samples and suffer from the mode collapse phenomenon. However, for VE models, we show that since their forward process maintains the multi-modal property of data, they have a better ability to avoid the mode collapse facing strong guidance. The simulation and real-world experiments also support theoretical results.
Submission Number: 67
Loading