A Variational Perspective on Diffusion-Based Generative Models and Score Matching

Chin-Wei Huang; Jae Hyun Lim; Aaron Courville

A Variational Perspective on Diffusion-Based Generative Models and Score Matching

Chin-Wei Huang, Jae Hyun Lim, Aaron Courville

Published: 15 Jun 2021, Last Modified: 04 May 2025INNF+ 2021 posterReaders: Everyone

Keywords: SDE, score matching, diffusion models, ODE, likelihood estimation, variational autoencoder, continuous-time flow

TL;DR: We show that performing score matching amounts to maximizing a lower bound on likelihood of plug-in reverse SDEs.

Abstract: Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes can be reverted via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an inverse formula to define a generative diffusion process. Despite the empirical success, a theoretical underpinning of this procedure is still lacking. In this work, we approach the (continuous-time) generative diffusion directly and derive a variational framework for likelihood estimation, which includes continuous-time normalizing flows as a special case, and can be seen as an infinitely deep variational autoencoder. Under this framework, we show that minimizing the score-matching loss is equivalent to maximizing the ELBO of the plug-in reverse SDE proposed by Song et al. (2021), bridging the theoretical gap.

Questions/feedback Request For Reviewers: We would like to know if the reviewers have any writing suggestions to improve the readability of the paper; e.g. if specific parts of the derivation or explanations of the theorem are not clear enough.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/a-variational-perspective-on-diffusion-based/code)

3 Replies

Loading