Timestep Embeddings Trigger Collapse in Diffusion Text Generation

Published: 24 May 2025, Last Modified: 06 Jun 2025CoNLL 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: diffusion model, diffusion language model, timestep embedding, text generation
TL;DR: This paper identifies timestep embeddings as a key factor causing the collapse problem in diffusion text generation and proposes a lightweight word embedding technique and a novel regularization method to significantly improve generation quality.
Abstract: Diffusion models have achieved remarkable success in various generative tasks, particularly in image and audio synthesis, which work by iteratively refining random noise into realistic data. Recent studies have highlighted the potential of diffusion models for text generation, but several challenges remain unresolved. One significant issue is that the model begins to degrade a previous sample rather than improve it after a certain timestep in the generation process, resulting in broken text. In this paper, we reveal that timestep embeddings are a principal cause of the collapse problem by analyzing their interactions with word embeddings. Further, we propose two key methods: (a) a simple lightweight word embedding technique that enhances model analyzability as well as learning efficiency; (b) a novel regularization on both word and timestep embeddings. Experimental results demonstrate that our approach effectively mitigates the collapse problem and can lead to a considerable improvement in the quality of generated text.
Submission Number: 165
Loading