Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou; Chenlin Meng; Stefano Ermon

Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, Stefano Ermon

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Diffusion Models, Discrete Diffusion Models, Language Modeling, Transformers

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We scale discrete diffusion models to GPT-2 using our novel score entropy loss function.

Abstract: Despite their groundbreaking performance for many generative modeling tasks, diffusion models have fallen short on discrete data domains such as natural language. Crucially, standard diffusion models rely on the well-established theory of score matching, but efforts to generalize this to discrete structures have not yielded the same empirical gains. In this work, we bridge this gap by proposing score entropy, a novel discrete score matching loss that is more stable than existing methods, forms an ELBO for maximum likelihood training, and can be efficiently optimized with a denoising variant. Combined with architectural improvements, we scale to the GPT-2 language modeling experiments, achieving highly competitive performance. When comparing similarly sized-architectures, our score entropy discrete diffusion model attains comparable zero-shot perplexities despite reporting an upper bound (within $15$ percent of and sometimes outperforming GPT-2), can trade off speed for generation quality ($4\times$ lower generative perplexity when matching function evaluations and $16\times$ fewer function evaluations when matching generative perplexity compared to standard autoregressive sampling), and enables arbitrary infilling beyond standard autoregressive left to right prompting.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6401

Loading