- Abstract: Probabilistic models are often trained by maximum likelihood, which corresponds to minimizing a specific form of f-divergence between the model and data distribution. We derive an upper bound that holds for all f-divergences, showing the intuitive result that the divergence between two joint distributions is at least as great as the divergence between their corresponding marginals. Additionally, the f-divergence is not formally defined when two distributions have different supports. We thus propose a noisy version of f-divergence which is well defined in such situations. We demonstrate how the bound and the new version of f-divergence can be readily used to train complex probabilistic generative models of data and that the fitted model can depend significantly on the particular divergence used.
- Keywords: variational inference, generative model, f divergence
- TL;DR: Training generative models using an upper bound of the f divergence.