TUBE: Tangent Upper Bound on Evidence for Discrete Diffusion Language Models
Keywords: Probabilistic methods, Discrete Diffusion Language Models, Likelihood estimation, Any-Order Autoregression, Bounding evidence in variational models
TL;DR: We introduce TUBE to upper-bound intractable log-likelihoods in latent-variable models, revealing that block masked diffusion models still trail exact autoregressive models.
Abstract: Log-likelihood is a standard metric for evaluating generative models. Unfortunately, in contrast to autoregressive models (ARMs), discrete diffusion models generally do not admit exact computation of this quantity. Existing evaluations, therefore, rely on the evidence lower bound (ELBO), leaving unclear how much higher the true value may be. We address this by introducing the **Tangent Upper Bound on Evidence** (**TUBE**), a variational upper bound on log-likelihood that admits an unbiased Monte Carlo estimator. Our TUBE extends across latent-variable models, including masked diffusion models (MDMs), any-order ARMs (AO-ARMs), and block variants of both. Applied to block MDMs and block AO-ARMs, TUBE reveals our key empirical finding that these models lie strictly below the exact ARM baseline, showing that ARMs still dominate in likelihood.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 232
Loading