Rethinking Again the Value of Network Pruning -- A Dynamical Isometry PerspectiveDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: neural network pruning, dynamical isometry, model compression, filter pruning
Abstract: Several recent works questioned the value of inheriting weight in structured neural network pruning because they empirically found training from scratch can match or even outperform finetuning a pruned model. In this paper, we present evidences that this argument is actually \emph{inaccurate} because of using improperly small finetuning learning rates. With larger learning rates, our results consistently suggest pruning outperforms training from scratch on multiple networks (ResNets, VGG11) and datasets (MNIST, CIFAR10, ImageNet) over most pruning ratios. To deeply understand why finetuning learning rate holds such a critical role, we examine the theoretical reason behind through the lens of \emph{dynamical isometry}, a nice property of networks that can make the gradient signals preserve norm during propagation. Our results suggest that weight removal in pruning breaks dynamical isometry, \emph{which fundamentally answers for the performance gap between a large finetuning LR and~a small one}. Therefore, it is necessary to recover the dynamical isometry before finetuning. In this regard, we also present a regularization-based technique to do so, which is rather simple-to-implement yet effective in dynamical isometry recovery on modern residual convolutional neural networks.
One-sentence Summary: We show inheriting weights in filter pruning is valuable and examine the impact of finetuning LR on the final performance via dynamical isometry.
27 Replies

Loading