Keywords: Pruning, Model Compression, One-shot, Global Magnitude Pruning
Abstract: Neural network pruning remains a very important yet challenging problem to solve. Many pruning solutions have been proposed over the years with high degrees of algorithmic complexity. In this work, we shed light on a very simple pruning technique that achieves state-of-the-art (SOTA) performance. We showcase that magnitude based pruning, specifically, global magnitude pruning (GP) is sufficient to achieve SOTA performance on a range of neural network architectures. In certain architectures, the last few layers of a network may get over-pruned. For these cases, we introduce a straightforward method to mitigate this. We preserve a certain fixed number of weights in each layer of the network to ensure no layer is over-pruned. We call this the Minimum Threshold (MT). We find that GP combined with MT when needed, achieves SOTA performance on all datasets and architectures tested including ResNet-50 and MobileNet-V1 on ImageNet. Code available on github.
One-sentence Summary: Global magnitude pruning along with minimum threshold is a very simple pruning technique and at the same time sufficient to obtain SOTA pruning performance.