Linearly Constrained Weights: Resolving the Vanishing Gradient Problem by Reducing Angle Bias

Takuro Kutsuna

Linearly Constrained Weights: Resolving the Vanishing Gradient Problem by Reducing Angle Bias

Takuro Kutsuna

15 Feb 2018 (modified: 10 Feb 2022)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: In this paper, we first identify \textit{angle bias}, a simple but remarkable phenomenon that causes the vanishing gradient problem in a multilayer perceptron (MLP) with sigmoid activation functions. We then propose \textit{linearly constrained weights (LCW)} to reduce the angle bias in a neural network, so as to train the network under the constraints that the sum of the elements of each weight vector is zero. A reparameterization technique is presented to efficiently train a model with LCW by embedding the constraints on weight vectors into the structure of the network. Interestingly, batch normalization (Ioffe & Szegedy, 2015) can be viewed as a mechanism to correct angle bias. Preliminary experiments show that LCW helps train a 100-layered MLP more efficiently than does batch normalization.

TL;DR: We identify angle bias that causes the vanishing gradient problem in deep nets and propose an efficient method to reduce the bias.

Keywords: vanishing gradient problem, multilayer perceptron, angle bias

Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10), [CIFAR-100](https://paperswithcode.com/dataset/cifar-100), [SVHN](https://paperswithcode.com/dataset/svhn)

10 Replies

Loading