Abstract: Network pruning is widely used for reducing the heavy computational cost of deep models. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. In this work, we make a rather surprising observation: fine-tuning a pruned model only gives comparable or even worse performance than training that model with randomly initialized weights. Our results have several implications: 1) training a large, over-parameterized model is not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are not necessarily useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited weights, is what leads to the efficiency benefit in the final model, which suggests that some pruning algorithms could be seen as performing network architecture search.
TL;DR: In network pruning, fine-tuning a pruned model only gives comparable or worse performance than training it from scratch. This advocate a rethinking of existing pruning algorithms.
Keywords: Network Pruning, Network Compression, Architecture Search, Training from Scratch
22 Replies
Loading