VALM: Variational Autoencoder Language Models for Highly Parallel Text Generation

ICLR 2026 Conference Submission20768 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: VAE, Variational Autoencoder, LLM, Single-pass, text, parallel decoding, ELBO, discrete
TL;DR: Variational Autoencoders can be fast language models, generating tens of tokens simultaneously
Abstract: Autoregressive language models have shown impressive abilities across domains. However, their token-by-token decoding limits inference speed. We introduce Variational Autoencoder Language Models (VALM), a non-autoregressive architecture that predicts entire sequences in parallel from a single global latent, with no denoising or diffusion losses. VALM uses a bidirectional transformer encoder and decoder with an ELBO objective, reducing sequential depth from $\mathcal{O}(LT)$ to $\mathcal{O}(L)$ for an $L$-layer network generating $T$ tokens. We train VALM-1, which generates 32 tokens in a single forward pass, demonstrating the applicability of pure VAEs to discrete text and presenting a novel approach to high-throughput language modeling on standard GPUs.
Primary Area: generative models
Submission Number: 20768
Loading