The Diffusion Duality

Published: 06 Mar 2025, Last Modified: 22 Apr 2025ICLR 2025 DeLTa Workshop OralEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 8 pages)
Keywords: Diffusion Models, Language Models, Discrete Diffusion
TL;DR: We demonstrate that discrete diffusion emerges from Gaussian diffusion which allows adapt techniques from the later to improve the former.
Abstract: In the context of language modeling, Uniform State discrete Diffusion Models (USDMs) hold the promise of faster generation due to their ability to self-correct. However, they are typically outperformed by Masked Diffusion Models (MDMs). In this work, we tighten the likelihood gap between USDMs and MDMs by exploiting a fundamental insight: Uniform state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, DUO, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This method accelerates sampling by **two orders** of magnitude, while preserving both quality and diversity. The code and the trained models are available at the project page: https://s-sahoo.com/duo
Submission Number: 109
Loading