Keywords: masked diffusion models, language models, inference
TL;DR: The paper introduces a training-free sampling algorithm for masked diffusion models that resolves conflicts among proposed tokens by deleting lower-confidence candidates
Abstract: Masked diffusion models (MDMs) offer a compelling alternative to autoregres-
sive models (ARMs) for discrete text generation because they enable parallel
token sampling, rather than sequential, left-to-right generation. This means po-
tentially much faster inference. However, effective parallel sampling faces two
competing requirements: (i) simultaneously updated tokens must be conditionally
independent, and (ii) updates should prioritise high-confidence predictions. These
goals conflict because high-confidence predictions often cluster and depend on
each other, opportunities for parallel updates.
We present PUNT, a model-agnostic sampler that reconciles this trade-off. Our
method identifies token dependencies and removes lower-confidence tokens from
conflicting groups. This produces sets of indices for unmasking that satisfy both
independence and confidence criteria. Our approach ensures improved parallel
unmasking through approximate conditional independence testing.
Our experiments show that PUNT delivers a superior trade-off between accuracy
and compute when compared to other strong training-free baselines, especially for
generation of longer sequences. On the IFEval benchmark, it achieves up to 16%
higher accuracy over baseline methods, including sequential generation (one-by-
one). These gains hold across different values of hyperparameters, mitigating the
need for brittle hyperparameter tuning. Moreover, we observe that PUNT induces
an emergent hierarchical generation strategy, where the model first establishes
high-level paragraph structure before local refinement, suggesting a planning-like
generation process that contributes to strong alignment performance.
Primary Area: generative models
Submission Number: 7925
Loading