Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation

Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation

ICLR 2026 Conference Submission19003 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: text diffusion, text generation, continuous text diffusion

TL;DR: We present Smoothie, a text diffusion method that constructs its diffusion process with consideration of the discrete nature of text and the semantic relationships between tokens.

Abstract: Diffusion models have achieved state-of-the-art performance in generating images, audio, and video, but their adaptation to text remains challenging due to its discrete nature. Prior approaches either apply Gaussian diffusion in continuous latent spaces, which inherits semantic structure but struggles with token decoding, or operate in categorical simplex space, which respect discreteness but disregard semantic relation between tokens. In this paper, we propose Smoothing Diffusion on Token Embeddings (Smoothie), a novel diffusion method that combines the strengths of both approaches by progressively smoothing token embeddings based on semantic similarity. This technique enables gradual information removal while maintaining a natural decoding process. Experimental results on several sequence-to-sequence generation tasks demonstrate that Smoothie outperforms existing diffusion-based models in generation quality. Furthermore, ablation studies show that our proposed diffusion space yields better performance than both the standard embedding space and the categorical simplex.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 19003

Loading