Bayesian Deep Learning via Stochastic Gradient MCMC with a Stochastic Approximation Adaptation

Wei Deng; Xiao Zhang; Faming Liang; Guang Lin

Bayesian Deep Learning via Stochastic Gradient MCMC with a Stochastic Approximation Adaptation

Wei Deng, Xiao Zhang, Faming Liang, Guang Lin

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: We propose a robust Bayesian deep learning algorithm to infer complex posteriors with latent variables. Inspired by dropout, a popular tool for regularization and model ensemble, we assign sparse priors to the weights in deep neural networks (DNN) in order to achieve automatic “dropout” and avoid over-fitting. By alternatively sampling from posterior distribution through stochastic gradient Markov Chain Monte Carlo (SG-MCMC) and optimizing latent variables via stochastic approximation (SA), the trajectory of the target weights is proved to converge to the true posterior distribution conditioned on optimal latent variables. This ensures a stronger regularization on the over-fitted parameter space and more accurate uncertainty quantification on the decisive variables. Simulations from large-p-small-n regressions showcase the robustness of this method when applied to models with latent variables. Additionally, its application on the convolutional neural networks (CNN) leads to state-of-the-art performance on MNIST and Fashion MNIST datasets and improved resistance to adversarial attacks.

Keywords: generalized stochastic approximation, stochastic gradient Markov chain Monte Carlo, adaptive algorithm, EM algorithm, convolutional neural networks, Bayesian inference, sparse prior, spike and slab prior, local trap

TL;DR: a robust Bayesian deep learning algorithm to infer complex posteriors with latent variables

Data: [Fashion-MNIST](https://paperswithcode.com/dataset/fashion-mnist)

5 Replies

Loading