Stop Training for the Worst: Progressive Unmasking Accelerates Masked Diffusion Training

Jaeyeon Kim; Jonathan Geuter; David Alvarez-Melis; Sham M. Kakade; Sitan Chen

Stop Training for the Worst: Progressive Unmasking Accelerates Masked Diffusion Training

Jaeyeon Kim, Jonathan Geuter, David Alvarez-Melis, Sham M. Kakade, Sitan Chen

Published: 02 Mar 2026, Last Modified: 14 Mar 2026ICLR 2026 Workshop MM Intelligence PosterEveryoneRevisionsCC BY 4.0

Track: long paper (up to 8 pages)

Keywords: Discrete Diffusion, Masked Diffusion Models, Diffusion Models, Distribution Design, Learning Theory

TL;DR: We propose PUMA, a simple modification in the masked diffusion forward process that accelerates their pretraining.

Abstract: Masked Diffusion Models (MDMs) have emerged as a promising approach for generative modeling in discrete spaces. By generating sequences in any order and allowing for parallel decoding, they enable fast inference and strong performance on non-causal tasks. However, this flexibility comes with a training complexity trade-off: MDMs train on an exponentially large set of masking patterns, which is not only computationally expensive, but also creates a train--test mismatch between the random masks used in training and the highly structured masks induced by inference-time unmasking. In this work, we propose Progressive UnMAsking (PUMA), a simple modification of the forward masking process that aligns training-time and inference-time masking patterns, thereby focusing optimization on inference-aligned masks and speeding up training. Empirically, PUMA speeds up pretraining at the 125M scale by and offers complementary advantages on top of common recipes like autoregressive initialization.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 36

Loading