Scale Normalization

Henry Z Lo; Kevin Amaral; Wei Ding

Scale Normalization

Henry Z Lo, Kevin Amaral, Wei Ding

05 Jul 2025 (modified: 18 Feb 2016)ICLR 2016Readers: Everyone

Abstract: One of the difficulties of training deep neural networks is caused by improper scaling between layers. These scaling issues introduce exploding / gradient problems, and have typically been addressed by careful variance-preserving initialization. We consider this problem as one of preserving scale, rather than preserving variance. This leads to a simple method of scale-normalizing weight layers, which ensures that scale is approximately maintained between layers. Our method of scale-preservation ensures that forward propagation is impacted minimally, while backward passes maintain gradient scales. Preliminary experiments show that scale normalization effectively speeds up learning, without introducing additional hyperparameters or parameters.

Conflicts: cs.umb.edu

3 Replies

Loading