STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Jiatao Gu; Tianrong Chen; David Berthelot; Huangjie Zheng; Yuyang Wang; Ruixiang ZHANG; Laurent Dinh; Miguel Ángel Bautista; Joshua M. Susskind; Shuangfei Zhai

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Jiatao Gu, Tianrong Chen, David Berthelot, Huangjie Zheng, Yuyang Wang, Ruixiang ZHANG, Laurent Dinh, Miguel Ángel Bautista, Joshua M. Susskind, Shuangfei Zhai

Published: 18 Sept 2025, Last Modified: 15 Dec 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Normalizing Flows, Autoregressive Models, Latent Autoregressive Flows, Scalable Image Generation

TL;DR: We propose STARFlow, a scalable autoregressive flow model that enables high-quality image generation

Abstract: We present STARFlow, a scalable generative model based on normalizing flows that achieves strong performance on high-resolution image synthesis. STARFlow's main building block is Transformer Autoregressive Flow (TARFlow), which combines normalizing flows with Autoregressive Transformer architectures and has recently achieved impressive results in image modeling. In this work, we first establish the theoretical universality of TARFlow for modeling continuous distributions. Building on this foundation, we introduce a set of architectural and algorithmic innovations that significantly enhance the scalability: (1) a deep-shallow design where a deep Transformer block captures most of the model’s capacity, followed by a few shallow Transformer blocks that are computationally cheap yet contribute non-negligibly, (2) learning in the latent space of pretrained autoencoders, which proves far more effective than modeling pixels directly, and (3) a novel guidance algorithm that substantially improves sample quality. Crucially, our model remains a single, end-to-end normalizing flow, allowing exact maximum likelihood training in continuous space without discretization. STARFlow achieves competitive results in both class- and text-conditional image generation, with sample quality approaching that of state-of-the-art diffusion models. To our knowledge, this is the **first** successful demonstration of normalizing flows at this scale and resolution. Code and weights available at https://github.com/apple/ml-starflow.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 10954

Loading