On Designing Diffusion Autoencoders for Efficient Generation and Representation Learning

Magdalena Proszewska; Nikolay Malkin; Siddharth N

On Designing Diffusion Autoencoders for Efficient Generation and Representation Learning

Magdalena Proszewska, Nikolay Malkin, Siddharth N

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: diffusion models, diffusion autoencoders, generative models, representation learning

TL;DR: We propose DMZ, a diffusion autoencoder that combines efficient generation with meaningful representation learning, and study design choices and various applications.

Abstract: Diffusion autoencoders (DAs) are variants of diffusion generative models that use an input-dependent latent variable to capture representations alongside the diffusion process. These representations can be used for tasks such as downstream classification, controllable generation, and interpolation. However, the generative performance of DAs relies heavily on how well the prior distribution over the latent variables can be modelled and subsequently sampled from. Better generative modelling is also the goal of another class of diffusion models—those that learn their forward (noising) process. While effective at adjusting the noise process in an input-dependent manner, they must satisfy additional constraints derived from the terminal conditions of the diffusion process. Here, we draw a connection between these two classes of models and show that certain design decisions (latent variable choice, conditioning method, etc.) in the DA framework—leading to a model we term DMZ—enable effective representations as evaluated on downstream tasks, including domain transfer, as well as more efficient modelling and generation with fewer denoising steps compared to standard diffusion models.

Primary Area: generative models

Submission Number: 9087

Loading