everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
With a well-trained ``full-size'' network, model pruning aims to derive a small network by removing some weights with minimum performance deterioration, e.g., image classification accuracy. In the typical setup, both model training and pruning are done on the \emph{same} target dataset, which represent a downstream task of interest. On the other hand, to better solve a downstream task in the real world, the well-established practice is transferring a model pretrained on some source data to the target dataset via finetuning. The two worlds motivate us to study model pruning in a new realistic setup, which embraces a pretrained model and allows transferring it to the target dataset. In the new setup, we first show, as expected, transferring a pretrained model improves state-of-the-art (SOTA) pruning methods remarkably once they follow a principled pruning pipeline: \emph{transfer the pretrained model by finetuning on the target dataset, prune, and finetune again.} Surprisingly, in the new setup, the simplistic random pruning (which removes random filters) and the L1-norm method (which removes filters that have small L1 norms) outperform SOTA methods, and the latter performs the best! Based on the simple L1-norm method, we propose two techniques to further improve the pruning performance by exploiting the full-size model. Specifically, when finetuning the L1-norm pruned model, our techniques (1) directly reuse the full-size model's classifier, or (2) regularize the pruned model in its finetuning through aligning its features to the off-the-shelf class-mean computed by the full-size model. Extensive experiments on large-scale benchmark datasets demonstrate that our techniques significantly outperform existing approaches.