Keywords: Generative Models, Kernel Methods, Optimal Transportation, Probability Divergences
Abstract: We study the gradient flow for a relaxed approximation to the Kullback-Leibler (KL) divergence
between a moving source and a fixed target distribution.
This approximation, termed the
KALE (KL approximate lower-bound estimator), solves a regularized version of
the Fenchel dual problem defining the KL over a restricted class of functions.
When using a Reproducing Kernel Hilbert Space (RKHS) to define the function
class, we show that the KALE continuously interpolates between the KL and the
Maximum Mean Discrepancy (MMD). Like the MMD and other Integral Probability
Metrics, the KALE remains well defined for mutually singular
distributions. Nonetheless, the KALE inherits from the limiting KL a greater
sensitivity to mismatch in the support of the distributions, compared with the MMD. These two properties make the
KALE gradient flow particularly well suited when the target distribution is supported on a low-dimensional manifold. Under an assumption of sufficient smoothness of the trajectories, we show the global convergence of the KALE flow. We propose a particle implementation of the flow given initial samples from the source and the target distribution, which we use to empirically confirm the KALE's properties.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
Supplementary Material: pdf
TL;DR: We introduce the KALE, a kernel based probability divergence that interpolates between the KL and the MMD, and study its wasserstein gradient flow.
14 Replies
Loading