Keywords: VAE, Variational Autoencoder, LLM, Single-pass, text, parallel decoding, ELBO, discrete
TL;DR: Variational Autoencoders can be fast language models, generating tens of tokens simultaneously
Abstract: Autoregressive language models have shown impressive abilities across domains. However, their token-by-token decoding limits inference speed. We introduce Variational Autoencoder Language Models (VALM), a non-autoregressive architecture that predicts entire sequences in parallel from a single global latent, with no denoising or diffusion losses. VALM uses a bidirectional transformer encoder and decoder with an ELBO objective, reducing sequential depth from $\mathcal{O}(LT)$ to $\mathcal{O}(L)$ for an $L$-layer network generating $T$ tokens. We train VALM-1, which generates 32 tokens in a single forward pass, demonstrating the applicability of pure VAEs to discrete text and presenting a novel approach to high-throughput language modeling on standard GPUs.
Primary Area: generative models
Submission Number: 20768
Loading