Keywords: pruning, cnn, transformers
TL;DR: Existing methods might be too complex and require too many extra parameters. Our simple and efficient method can yield the state-of-the-art random/structured/channel pruning results.
Abstract: DNN pruning is a popular way to reduce the size of a model, improve the inference
latency, and minimize the power consumption on DNN accelerators. However,
existing approaches might be too complex, expensive or ineffective to apply to
a variety of vision/language tasks, DNN architectures and to honor structured
pruning constraints. In this paper, we propose an efficient yet effective train-time
pruning scheme, Parameter-free Differentiable Pruning (PDP), which offers state-
of-the-art qualities in model size, accuracy, and training cost. PDP uses a dynamic
function of weights during training to generate soft pruning masks for the weights
in a parameter-free manner for a given pruning target. While differentiable, the
simplicity and efficiency of PDP make it universal enough to deliver state-of-the-art
random/structured/channel pruning results on various vision and natural language
tasks. For example, for MobileNet-v1, PDP can achieve 68.2% top-1 ImageNet1k
accuracy at 86.6% sparsity, which is 1.7% higher accuracy than those from the
state-of-the-art algorithms. Also, PDP yields over 83.1% accuracy on Multi-Genre
Natural Language Inference with 90% sparsity for BERT, while the next best from
the existing techniques shows 81.5% accuracy. In addition, PDP can be applied to
structured pruning, such as N:M pruning and channel pruning. For 1:4 structured
pruning of ResNet18, PDP improved the top-1 ImageNet1k accuracy by over 3.6%
over the state-of-the-art. For channel pruning of ResNet50, PDP reduced the top-1
ImageNet1k accuracy by 0.6% from the state-of-the-art.
Submission Number: 13555
Loading