Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference
Abstract: We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks
in the two-layer and infinite-width case. We consider a regression problem with a regularized evi-
dence lower bound (ELBO) which is decomposed into the expected log-likelihood of the data and
the Kullback-Leibler (KL) divergence between the a priori distribution and the variational posterior.
With an appropriate weighting of the KL, we prove a law of large numbers for three different train-
ing schemes: (i) the idealized case with exact estimation of a multiple Gaussian integral from the
reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling, commonly known
as Bayes by Backprop, and (iii) a new and computationally cheaper algorithm which we introduce
as Minimal VI. An important result is that all methods converge to the same mean-field limit. Fi-
nally, we illustrate our results numerically and discuss the need for the derivation of a central limit
theorem.
Loading