Scale Normalization

Henry Z Lo, Kevin Amaral, Wei Ding

Feb 18, 2016 (modified: Feb 18, 2016) ICLR 2016 workshop submission readers: everyone
  • Abstract: One of the difficulties of training deep neural networks is caused by improper scaling between layers. These scaling issues introduce exploding / gradient problems, and have typically been addressed by careful variance-preserving initialization. We consider this problem as one of preserving scale, rather than preserving variance. This leads to a simple method of scale-normalizing weight layers, which ensures that scale is approximately maintained between layers. Our method of scale-preservation ensures that forward propagation is impacted minimally, while backward passes maintain gradient scales. Preliminary experiments show that scale normalization effectively speeds up learning, without introducing additional hyperparameters or parameters.
  • Conflicts: