Keywords: differentiable pruning, parameter-free
TL;DR: existing differentiable pruning seems too expensive in terms of training cost. our method shows that a differentiable yet parameter-free method can deliver the state of the art results.
Abstract: In this paper, we propose an efficient yet effective train-time pruning scheme, Parameter-free Differentiable Pruning (PDP), which offers state-of-the-art qualities in model size, accuracy, and training cost. PDP uses a dynamic function of weights during training to generate soft pruning masks for the weights in a parameter-free manner for a given pruning target. While differentiable, the simplicity and efficiency of PDP make it universal enough to deliver state-of-the-art random/structured/channel pruning results on various vision models. For example, for MobileNet-v1, PDP can achieve 68.2% top-1 ImageNet1k accuracy at 86.6% sparsity, which is 1.7% higher accuracy than those from the state-of-the-art algorithms. PDP also improved the top-1 ImageNet1k accuracy of ResNet18 by over 3.6% and reduced the top-1 ImageNet1k accuracy of ResNet50 by 0.6% from the state-of-the-art.
Submission Number: 50
Loading