- Keywords: SDE, score matching, diffusion models, ODE, likelihood estimation, variational autoencoder, continuous-time flow
- TL;DR: We show that performing score matching amounts to maximizing a lower bound on likelihood of plug-in reverse SDEs.
- Abstract: Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes can be reverted via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an inverse formula to define a generative diffusion process. Despite the empirical success, a theoretical underpinning of this procedure is still lacking. In this work, we approach the (continuous-time) generative diffusion directly and derive a variational framework for likelihood estimation, which includes continuous-time normalizing flows as a special case, and can be seen as an infinitely deep variational autoencoder. Under this framework, we show that minimizing the score-matching loss is equivalent to maximizing the ELBO of the plug-in reverse SDE proposed by Song et al. (2021), bridging the theoretical gap.
- Questions/feedback Request For Reviewers: We would like to know if the reviewers have any writing suggestions to improve the readability of the paper; e.g. if specific parts of the derivation or explanations of the theorem are not clear enough.