Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches
Nov 07, 2017 (modified: Nov 07, 2017)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies. Unfortunately, due to the large number of weights, all the examples in a mini-batch typically share the same weight perturbation, thereby limiting the variance reduction effect of large mini-batches. We introduce flipout, an efficient method for decorrelating the gradients within a mini-batch by implicitly sampling pseudo-independent weight perturbations for each example. Empirically, flipout achieves the ideal linear variance reduction for fully connected networks, convolutional networks, and RNNs. We find significant speedups in training neural networks with multiplicative Gaussian perturbations. Flipout allows us to vectorize evolution strategies, so that a single GPU with flipout can handle the same throughput at least as 40 CPU cores under existing methods, equivalent to a factor-of-4 cost reduction on Amazon Web Services.
TL;DR:We introduce flipout, an efficient method for decorrelating the gradients computed by stochastic neural net weights within a mini-batch by implicitly sampling pseudo-independent weight perturbations for each example.