Keywords: SDE, score matching, diffusion models, ODE, likelihood estimation, variational autoencoder, continuous-time flow
TL;DR: We show that performing score matching amounts to maximizing a lower bound on likelihood of plug-in reverse SDEs.
Abstract: Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes can be reverted via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an inverse formula to define a generative diffusion process. Despite the empirical success, a theoretical underpinning of this procedure is still lacking. In this work, we approach the (continuous-time) generative diffusion directly and derive a variational framework for likelihood estimation, which includes continuous-time normalizing flows as a special case, and can be seen as an infinitely deep variational autoencoder. Under this framework, we show that minimizing the score-matching loss is equivalent to maximizing the ELBO of the plug-in reverse SDE proposed by Song et al. (2021), bridging the theoretical gap.
Questions/feedback Request For Reviewers: We would like to know if the reviewers have any writing suggestions to improve the readability of the paper; e.g. if specific parts of the derivation or explanations of the theorem are not clear enough.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/a-variational-perspective-on-diffusion-based/code)
3 Replies
Loading