Keywords: discrete diffusion, diffusion language models, large language models, likelihood estimation
TL;DR: DUEL gives masked diffusion models proper perplexity via exact likelihood under their deterministic test-time policies, closing up to 32% (in-domain) and 82% (zero-shot) of their perplexity gap with autoregressive models.
Abstract: Masked diffusion models (MDMs) generate text by iteratively selecting positions to unmask and then predicting tokens at those positions. Yet MDMs lack proper likelihood evaluation: the evidence lower bound (ELBO) is not only a loose bound on log-likelihood, but, as we show, is also computed under the training distribution rather than the test-time distribution. We resolve this within our DUEL framework, which unifies leading MDM sampling strategies that employ deterministic position selection. We prove that DUEL samplers admit exact likelihood computation under the test-time distribution—giving MDMs proper likelihood, and hence proper perplexity, for the first time. This proper perplexity is the natural analogue of autoregressive perplexity and lets us revisit key questions about MDMs. Under proper evaluation, MDMs are substantially better than previously thought: the MDM–autoregressive perplexity gap shrinks by up to 32% on in-domain data and 82% on zero-shot benchmarks. DUEL enables the first principled comparison of fast, parallel samplers across compute budgets—an analysis impossible with the ELBO and unreliable with generative perplexity—identifying a strong default method. Finally, an oracle ordering (with ground-truth access) improves MDM perplexity well beyond ARM—36.47 vs. 52.11 on AG News—revealing substantial room for improved inference-time orderings.
Submission Number: 151
Loading