Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion

Published: 19 Apr 2026, Last Modified: 19 Apr 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Masked diffusion models have shown promising performance in generating high-quality samples in a wide range of domains, but accelerating their sampling process remains relatively underexplored. To investigate efficient samplers for masked diffusion, this paper theoretically analyzes the MaskGIT sampler for image modeling, revealing its implicit temperature sampling mechanism. Through this analysis, we show that MaskGIT is asymptotically equivalent to a choose-then-sample (CTS) formulation, instantiated as the “moment sampler,” which explicitly separates index selection from token sampling. This CTS reformulation is essential: it yields unbiased token sampling and exposes an algorithmic design space for index selection, both of which are inaccessible in MaskGIT’s original formulation. Regarding token sampling, we reveal that MaskGIT implicitly adopts a low-temperature sampler, which explains why MaskGIT often degrades with more sampling steps. The CTS reformulation of MaskGIT allows to fix the temperature sampling to ensure unbiasedness. We also improve the index selection in CTS through two key innovations: a partial caching technique for transformers that approximates longer sampling trajectories without proportional computational cost, and a hybrid approach formalizing the exploration-exploitation trade-off in adaptive unmasking. Experiments in image and text domains demonstrate our theory as well as the efficiency of our proposed methods, advancing both theoretical understanding and practical implementation of masked diffusion samplers.
Certifications: Featured Certification
Submission Type: Regular submission (no more than 12 pages of main content)
Supplementary Material: zip
Assigned Action Editor: ~Kangwook_Lee1
Submission Number: 7548
Loading