Gradient Regularisation as Approximate Variational InferenceDownload PDF

Nov 23, 2020 (edited Jan 11, 2021)AABI2020Readers: Everyone
  • Keywords: Variational Inference, Fisher Information, SGD
  • Abstract: Variational inference in Bayesian neural networks is usually performed using stochastic sampling which gives very high-variance gradients, and hence slow learning. Here, we show that it is possible to obtain a deterministic approximation of the ELBO for a Bayesian neural network by doing a Taylor-series expansion around the mean of the current variational distribution. The resulting approximate ELBO is the training-log-likelihood plus a squared gradient regulariser. In addition to learning the approximate posterior variance, we also consider a uniform-variance approximate posterior, inspired by the stationary distribution of SGD. The corresponding approximate ELBO has a simple form, as the log-likelihood plus a simple squared-gradient regulariser. We argue that this squared-gradient regularisation may at the root of the excellent empirical performance of SGD.
1 Reply