Scale NormalizationDownload PDF

20 Apr 2024 (modified: 18 Feb 2016)ICLR 2016 workshop submissionReaders: Everyone
Abstract: One of the difficulties of training deep neural networks is caused by improper scaling between layers. These scaling issues introduce exploding / gradient problems, and have typically been addressed by careful variance-preserving initialization. We consider this problem as one of preserving scale, rather than preserving variance. This leads to a simple method of scale-normalizing weight layers, which ensures that scale is approximately maintained between layers. Our method of scale-preservation ensures that forward propagation is impacted minimally, while backward passes maintain gradient scales. Preliminary experiments show that scale normalization effectively speeds up learning, without introducing additional hyperparameters or parameters.
Conflicts: cs.umb.edu
3 Replies

Loading