Compressed and Smooth Latent Space for Text Diffusion Modeling

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: latent diffusion, text generation, language models, autoencoders
TL;DR: We propose Cosmos, a diffusion-based text generation method that compresses text representations by 8×, generating high-quality text over twice as fast as standard diffusion models.
Abstract: Autoregressive language models dominate modern text generation, yet their sequential nature introduces fundamental limitations: decoding is slow, and maintaining global coherence remains challenging. Diffusion models offer a promising alternative by enabling parallel generation and flexible control; however, their application to text generation is hindered by the high dimensionality of token-level representations. We introduce Cosmos, a novel approach to text generation that operates entirely in a compressed, smooth latent space tailored specifically for diffusion. This space is learned using an autoencoder trained simultaneously for token-level reconstruction and alignment with frozen activations from a pretrained language encoder, providing robust semantic grounding and enabling effective perturbation‑based augmentations. Empirically, we demonstrate that text representations can be compressed up to $8\times$ while maintaining generation quality comparable to token‑level diffusion models. Furthermore, increasing the latent sequence length allows \textsc{Cosmos} to surpass both diffusion‑based and autoregressive baselines. We evaluate Cosmos on four diverse generative tasks including story generation, question generation, summarization, and detoxification and compare it with various generative paradigms. Cosmos achieves comparable or superior generation quality while offering more than $2\times$ faster inference. Code is released at https://github.com/MeshchaninovViacheslav/cosmos.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 24220
Loading