A Variational Perspective on Diffusion-Based Generative Models and Score MatchingDownload PDF

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 SpotlightReaders: Everyone
Keywords: Diffusion model, Neural SDE, Neural ODE, normalizing flows, score matching, hierarchical VAE, generative modelling, maximum likelihood, variational lower bound
Abstract: Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes that transform data into noise can be reversed via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an inverse formula to define a generative diffusion process. Despite the empirical success, a theoretical underpinning of this procedure is still lacking. In this work, we approach the (continuous-time) generative diffusion directly and derive a variational framework for likelihood estimation, which includes continuous-time normalizing flows as a special case, and can be seen as an infinitely deep variational autoencoder. Under this framework, we show that minimizing the score-matching loss is equivalent to maximizing a lower bound of the likelihood of the plug-in reverse SDE proposed by Song et al. (2021), bridging the theoretical gap.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
Supplementary Material: pdf
TL;DR: We derived an ELBO for continuous-time diffusion models using stochastic calculus, and made connection to continuous-time normalizing flows, hierarchical VAE, and score-based generative models.
Code: https://github.com/CW-Huang/sdeflow-light
14 Replies

Loading