Structured Pruning Meets OrthogonalityDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: network pruning, structured pruning, dynamical isometry, model compression
Abstract: Several recent works empirically found finetuning learning rate is crucial to the final performance in structured neural network pruning. It is shown that the \emph{dynamical isometry} broken by pruning answers for this phenomenon. How to develop a filter pruning method that maintains or recovers dynamical isometry \emph{and} is scalable to modern deep networks remains elusive up to now. In this paper, we present \emph{orthogonality preserving pruning} (OPP), a regularization-based structured pruning method that maintains the dynamical isometry during pruning. Specifically, OPP regularizes the gram matrix of convolutional kernels to encourage kernel orthogonality among the important filters meanwhile driving the unimportant weights towards zero. We also propose to regularize batch-normalization parameters for better preserving dynamical isometry for the whole network. Empirically, OPP can compete with the \emph{ideal} dynamical isometry recovery method on linear networks. On non-linear networks (ResNet56/VGG19, CIFAR datasets), it outperforms the available solutions \emph{by a large margin}. Moreover, OPP can also work effectively with modern deep networks (ResNets) on ImageNet, delivering encouraging performance in comparison to many recent filter pruning methods. To our best knowledge, this is the \emph{first} method that effectively maintains dynamical isometry during pruning for \emph{large-scale} deep neural networks.
One-sentence Summary: We propose a filter pruning method that drives the unimportant weights towards zero while maintaining dynamical isometry, which is easily scalable to ImageNet.
18 Replies

Loading