Keywords: Discrete Diffusion Models, Information Theory, Score Matching, Denoising Score Entropy (DSE), Denoising Cross-Entropy (DCE)
TL;DR: We derive information-theoretic identities for discrete diffusion, revealing score-based losses as exact mutual information decompositions and enabling principled log-likelihood estimation.
Abstract: We present an information-theoretic framework for discrete diffusion models
that yields principled estimators of log-likelihood using score-matching losses.
Inspired by the I-MMSE identity for the Gaussian setup, we derive analogous results for the discrete setting.
Specifically, we introduce the Information–Minimum Denoising Score Entropy (I-MDSE) relation,
which links mutual information between data and its diffused version to the minimum denoising score entropy (DSE) loss.
We extend this theory to masked diffusion and establish the Information–Minimum Denoising Cross-Entropy (I-MDCE) relation,
connecting cross-entropy losses to mutual information in discrete masked processes.
These results provide a time-integral decomposition of the log-likelihood of the data in terms of optimal score-based losses,
showing that commonly used losses such as DSE and DCE are not merely variational bounds
but tight and principled estimators of log-likelihood.
The I-MDCE decomposition further enables practical extensions, including time-free formula,
conditional likelihood estimation in prompt–response tasks, and coupled Monte Carlo estimation of likelihood ratios.
Experiments on synthetic and real-world data confirm the accuracy, variance stability, and utility of our estimators.
The code is publicly available at https://github.com/Dongjae0324/infodis.
Supplementary Material: zip
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 29019
Loading