Jet: A Modern Transformer-Based Normalizing Flow

Published: 22 Apr 2025, Last Modified: 22 Apr 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In the past, normalizing generative flows have emerged as a promising class of generative models for natural images. This type of model has many modeling advantages: the ability to efficiently compute log-likelihood of the input data, fast generation, and simple overall structure. Normalizing flows remained a topic of active research but later fell out of favor, as visual quality of the samples was not competitive with other model classes, such as GANs, VQ-VAE-based approaches or diffusion models. In this paper we revisit the design of coupling-based normalizing flow models by carefully ablating prior design choices and using computational blocks based on the Vision Transformer architecture, not convolutional neural networks. As a result, we achieve a much simpler architecture that matches existing normalizing flow models and improves over them when paired with pretraining. While the overall visual quality is still behind the current state-of-the-art models, we argue that strong normalizing flow models can help advancing the research frontier by serving as building components of more powerful generative models.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission:
  • Make SOTA claims more precise by emphasising that we rely on pertaining to beat SOTA and only match SOTA in the standard data-limited settings.
  • Improve Fig. 3 by adding NLL results for the train split.
  • Additional comparison to the Denseflow model in the Appendix.
  • Typo fixes.
Assigned Action Editor: Ole Winther
Submission Number: 3927
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview