Globally Convergent Variational Inference

Published: 27 May 2024, Last Modified: 27 May 2024AABI 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: convex optimization; neural tangent kernel; forward KL divergence
TL;DR: We utilize neural network asymptotics and convexity of the forward KL divergence to show that an amortized inference method minimizing this objective converges asymptotically to a unique global optimum in function space.
Abstract: In amortized variational inference, among many choices for the variational objective to optimize, the inclusive (or forward) KL divergence is one popular choice for fitting the encoder network because the resulting objective is ``likelihood-free''. While this approach has been shown empirically to find optimal variational distributions in a variety of settings, this success has not yet been well-motivated mathematically. In this work, we provide mathematical justification for optimization of this objective function compared to alternatives such as the evidence lower bound (ELBO). We present a novel functional analysis that utilizes neural network asymptotics as the number of neurons grows large. In the asymptotic regime of a fixed, positive-definite neural tangent kernel (NTK), we establish conditions under which the expected forward KL objective admits a unique solution (that is, a neural network) in a reproducing kernel Hilbert spaces (RKHS) of functions. Thereafter, we show theoretically that gradient descent dynamics converge to this global optimum.
Submission Number: 20
Loading