Linearly Constrained Weights: Resolving the Vanishing Gradient Problem by Reducing Angle Bias


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: In this paper, we first identify \textit{angle bias}, a simple but remarkable phenomenon that causes the vanishing gradient problem in a multilayer perceptron (MLP) with sigmoid activation functions. We then propose \textit{linearly constrained weights (LCW)} to reduce the angle bias in a neural network, so as to train the network under the constraints that the sum of the elements of each weight vector is zero. A reparameterization technique is presented to efficiently train a model with LCW by embedding the constraints on weight vectors into the structure of the network. Interestingly, batch normalization (Ioffe & Szegedy, 2015) can be viewed as a mechanism to correct angle bias. Preliminary experiments show that LCW helps train a 100-layered MLP more efficiently than does batch normalization.
  • TL;DR: We identify angle bias that causes the vanishing gradient problem in deep nets and propose an efficient method to reduce the bias.
  • Keywords: vanishing gradient problem, multilayer perceptron, angle bias