Continuous Autoregressive Generation with Mixture of Gaussians

Continuous Autoregressive Generation with Mixture of Gaussians

ICML 2025 Workshop TokShop Submission49 Authors

Published: 10 Jun 2025, Last Modified: 11 Jun 2025TokShopEveryoneRevisionsBibTeXCC BY 4.0

Archiving Submission: No (non-archival)

Keywords: Continuous Tokenization, Mixture of Gaussians

TL;DR: We introduce a method of continuous-token autoregressive generation using mixture of gaussians

Abstract: Autoregressive sequence models have traditionally relied on discrete tokenizations to leverage cross-entropy training, but this discretization introduces information loss that is costly in high-dimensional domains such as video. Utilizing higher capacity tokens enables higher quality generations, allowing one to use less tokens to represent a single image, and thus improve training and inference time. We propose a continuous-token autoregressive framework that parameterizes each step’s output distribution as a mixture of Gaussians. A lightweight Mixture of Gaussians (MoG) head predicts mixture weights, means, and full covariance factors, and is trained end-to-end by minimizing the Gaussian negative log-likelihood of continuous latent tokens. We demonstrate our approach on conditional video generation from a single image, comparing against a discrete-token and a continuous "mu-only" baseline. Our model achieves the best Frechet Video Distance (FVD), and generates frames with greater temporal diversity, as measured by SSIM components, but with a modest cost to FID.

Submission Number: 49

Loading