Keywords: Generative Models, Diffusion Models, Image Diffusion
TL;DR: A learned noise schedule can improve the log-likelihood of diffusion models.
Abstract: Diffusion models have gained traction as powerful algorithms for synthesizing high-quality images. Central to these algorithms is the diffusion process, a set of equations which maps data to noise
in a way that can significantly affect performance.
In this paper, we explore whether the diffusion
process can be learned from data.
Our work is grounded in Bayesian inference and seeks to improve log-likelihood estimation by casting the learned diffusion process as an approximate variational posterior that yields a tighter lower bound (ELBO) on the likelihood.
A widely held assumption is that the ELBO is invariant to the noise process: our work dispels this assumption and proposes multivariate learned adaptive noise (MuLAN), a learned diffusion process that applies noise at different rates across an image. Our method consists of three components: a multivariate noise schedule, adaptive input-conditional diffusion, and auxiliary variables; these components ensure that the ELBO is no longer invariant to the choice of the noise schedule as in previous works. Empirically, MuLAN sets a new **state-of-the-art** in density estimation on CIFAR-10 and ImageNet while matching the performance of previous state-of-the-art models with **50%** fewer steps. We provide the code, along with a blog post and video tutorial on the project page: https://s-sahoo.com/MuLAN
Primary Area: Generative models
Submission Number: 13843
Loading