Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Henry Z Lo, Kevin Amaral, Wei Ding
Feb 18, 2016 (modified: Feb 18, 2016)ICLR 2016 workshop submissionreaders: everyone
Abstract:One of the difficulties of training deep neural networks is caused by improper scaling between layers. These scaling issues introduce exploding / gradient problems, and have typically been addressed by careful variance-preserving initialization. We consider this problem as one of preserving scale, rather than preserving variance. This leads to a simple method of scale-normalizing weight layers, which ensures that scale is approximately maintained between layers. Our method of scale-preservation ensures that forward propagation is impacted minimally, while backward passes maintain gradient scales. Preliminary experiments show that scale normalization effectively speeds up learning, without introducing additional hyperparameters or parameters.
Enter your feedback below and we'll get back to you as soon as possible.