Keywords: Discrete Diffusion Models, Language Modeling
Abstract: Discrete diffusion models (DDMs) present a promising alternative to the autoregressive models, and is advantageous in supporting bidirectional attention, parallel generation, and greater controllability. However, DDMs either use (i) an uniform diffusion process, which provides token-level refinement but may cause abrupt changes in meaning at the sequence level; or (ii) an absorbing diffusion process that ensures stable semantic evolution but sacrifices refinement at the token level after unmasking. To resolve this dilemma, we synergize the adva ntages of both with the Mixture of Absorbing and Uniform Diffusion (MAUD) model. MAUD constructs a novel state transition matrix to interpolate between the two diffusion processes, simultaneously achieving sequence-level semantic stability and gradual token-level refinement. Empirical results show that MAUD outperforms existing DDMs in both language generation and language understanding tasks.
Primary Area: generative models
Submission Number: 8697
Loading