Learning Sparse Structured Ensembles with SG-MCMC and Network Pruning


Nov 07, 2017 (modified: Nov 07, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: A ensemble of neural networks is known to be more robust and accurate than an individual network, however with linearly-increased cost in both training and testing. In this work, we propose a two-stage method to learn Sparse Structured Ensembles (SSEs) for neural networks. In the first stage, we run SG-MCMC with group sparse priors to draw an ensemble of samples from the posterior distribution of network parameters. In the second stage, we apply weight-pruning to each sampled network and then perform retraining over the remained connections. In this way of learning SSEs with SG-MCMC and pruning, we reduce memory and computation cost significantly in both training and testing of NN ensembles, while maintaining high prediction accuracy. This is empirically verified in the experiments of learning SSE ensembles of both FNNs and LSTMs. For example, in LSTM based language modeling (LM), we obtain 12\% relative improvement in LM perplexity by learning a SSE of 4 large LSTM models, which has only 40\% of model parameters and 90\% of computations in total, as compared to a state-of-the-art large LSTM LM.
  • TL;DR: Propose a novel method by integrating SG-MCMC sampling, group sparse prior and network pruning to learn Sparse Structured Ensemble (SSE) with greater performance and significantly lower cost than traditional methods.
  • Keywords: ensemble learning, SG-MCMC, group sparse prior, network pruning