What Exactly Does Guidance Do in Masked Discrete Diffusion Models

ICLR 2026 Conference Submission21319 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Discrete Diffusion Models; Classifier-free Guidance
TL;DR: A theoretical analysis of the generation in masked discrete diffusion with CFG
Abstract: Masked discrete diffusion models have been gaining popularity recently, and classifier-free guidance, just like its continuous counterpart, has been proposed to enable efficacious conditional generation by discrete diffusion. To quantify the precise effect of discrete guidance, this article considers masked discrete diffusion with arbitrary data distribution in low dimension, so that the distribution that guided masked discrete diffusion samples from, as well as the sampling dynamics, can be analytically and exactly quantified and interpreted. When the full data distribution is a mixture over classes and the goal is to sample from a specific class, guidance amplifies class-specific regions while suppresses regions shared with other classes. This effect depends on the guidance strength $w$ and induces distinct covariance structures in the sampled distribution. Notably, we observe quantitatively different behaviors in $1$D and $2$D. We also show that for large $w$, the decay rate of the total variation ($\text{TV}$) along the reverse dynamics is double-exponential in $w$ for both $1$D and $2$D. These findings highlight the role of guidance, not just in shaping the output distribution, but also in controlling the dynamics of the sampling trajectory. Our theoretical analysis is supported by experiments that illustrate the geometric effects of guidance and its impact on convergence.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 21319
Loading