EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Theodoros Kouzelis; Ioannis Kakogeorgiou; Spyros Gidaris; Nikos Komodakis

EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris, Nikos Komodakis

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Equivariance regularization of autoencoders boosts latent generative modeling

Abstract: Latent generative models have emerged as a leading approach for high-quality image synthesis. These models rely on an autoencoder to compress images into a latent space, followed by a generative model to learn the latent distribution. We identify that existing autoencoders lack equivariance to semantic-preserving transformations like scaling and rotation, resulting in complex latent spaces that hinder generative performance. To address this, we propose EQ-VAE, a simple regularization approach that enforces equivariance in the latent space, reducing its complexity without degrading reconstruction quality. By finetuning pre-trained autoencoders with EQ-VAE, we enhance the performance of several state-of-the-art generative models, including DiT, SiT, REPA and MaskGIT, achieving a ×7 speedup on DiT-XL/2 with only five epochs of SD-VAE fine-tuning. EQ-VAE is compatible with both continuous and discrete autoencoders, thus offering a versatile enhancement for a wide range of latent generative models.

Lay Summary: Latent generative models are widely used for high-quality image synthesis but face challenges due to the complexity of the latent space created by existing autoencoders. We find that state-of-the-art autoencoders lack equivariance to simple transformations like scaling and rotation, resulting in complex latent spaces that hinder generative performance. We introduced EQ-VAE, a new method that enforces equivariance in the latent space of autoencoders. This ensures that simple transformations of the image result in corresponding simple transformations of the latent representation. This is achieved with a simple regularization loss that is compatible with both discrete and continuous autoencoders. By fine-tuning pre-trained autoencoders with EQ-VAE, we significantly improved the performance of various generative models, including Latent Diffusion Models such as DiT, SiT, and REPA, and Masked Generative Models such as MaskGIT. For example, with EQ-VAE, DiT achieves up to 7 times faster training time without compromising reconstruction quality.

Link To Code: https://github.com/zelaki/eqvae

Primary Area: Deep Learning->Generative Models and Autoencoders

Keywords: autoencoders, latent generative models, regularization

Submission Number: 6550

Loading