Model Pruning with Model Transfer

Xiaozhou Xu; Yuzhu Wang; Lechao Cheng; Manni Duan; Shu Kong

Model Pruning with Model Transfer

Xiaozhou Xu, Yuzhu Wang, Lechao Cheng, Manni Duan, Shu Kong

15 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: pruning, transfer learning, structured pruning

Abstract:

With a well-trained ``full-size'' network, model pruning aims to derive a small network by removing some weights with minimum performance deterioration, e.g., image classification accuracy. In the typical setup, both model training and pruning are done on the \emph{same} target dataset, which represent a downstream task of interest. On the other hand, to better solve a downstream task in the real world, the well-established practice is transferring a model pretrained on some source data to the target dataset via finetuning. The two worlds motivate us to study model pruning in a new realistic setup, which embraces a pretrained model and allows transferring it to the target dataset. In the new setup, we first show, as expected, transferring a pretrained model improves state-of-the-art (SOTA) pruning methods remarkably once they follow a principled pruning pipeline: \emph{transfer the pretrained model by finetuning on the target dataset, prune, and finetune again.} Surprisingly, in the new setup, the simplistic random pruning (which removes random filters) and the L1-norm method (which removes filters that have small L1 norms) outperform SOTA methods, and the latter performs the best! Based on the simple L1-norm method, we propose two techniques to further improve the pruning performance by exploiting the full-size model. Specifically, when finetuning the L1-norm pruned model, our techniques (1) directly reuse the full-size model's classifier, or (2) regularize the pruned model in its finetuning through aligning its features to the off-the-shelf class-mean computed by the full-size model. Extensive experiments on large-scale benchmark datasets demonstrate that our techniques significantly outperform existing approaches.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 83

Loading