FixNorm: Dissecting Weight Decay for Training Deep Neural Networks

Yucong Zhou; Yunxiao Sun; Jian Zhang; Zhao Zhong

FixNorm: Dissecting Weight Decay for Training Deep Neural Networks

Yucong Zhou, Yunxiao Sun, Jian Zhang, Zhao Zhong

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: weight decay, effective learning rate, cross-boundary risk, hyperparameter tuning

Abstract: Weight decay is a widely used technique for training Deep Neural Networks(DNN). It greatly affects generalization performance, but the underlying mechanisms are not fully understood. Recent works show that for layers followed by normalizations, weight decay mainly affects the \emph{effective learning rate}. However, although normalizations have been extensively adopted in modern DNNs, layers such as the final fully-connected layer do not satisfy this precondition. For these layers, the effects of weight decay are still unclear. In this paper, we comprehensively investigate the mechanisms of weight decay and find that except for influencing effective learning rate, weight decay has another distinct mechanism that is equally important: affecting generalization performance by controlling \emph{cross-boundary risk}. These two mechanisms together give a more comprehensive explanation for the effects of weight decay. Based on this discovery, we propose a new training method called \textbf{FixNorm}, which discards weight decay and directly controls the two mechanisms. We also propose a practical method to tune hyperparameters of FixNorm, finding near-optimal solutions 2$\sim$3 times faster than Bayesian Optimization. On ImageNet classification task, training EfficientNet-B0 with FixNorm achieves 77.7\%, which outperforms the original baseline by a clear margin. Surprisingly, when scaling MobileNetV2 to the same FLOPS and applying the same tricks with EfficientNet-B0, training with FixNorm achieves 77.4\%, which shows the importance of well-tuned training procedures and further verifies the effectiveness of our approach. We set up more well-tuned baselines using FixNorm, to facilitate fair comparisons in the community.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=w_PIMmggtA

5 Replies

Loading