Pruning via Ranking (PvR): A unified structured pruning approach

23 Sept 2023 (modified: 06 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: structured pruning, neural network, model compression
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose Pruning via Ranking (PvR) a novel structured pruning approach that generates dense sub-networks in terms of both function composition as well as model width for any user-supplied parameter budget.
Abstract: The increase in width and depth has facilitated neural networks to learn from large amounts of data leading to state-of-the-art results in both vision and NLP tasks. In order to democratize such massive networks, it is important to deploy them on resource-limited devices through model compression techniques such as structured pruning. Unfortunately, most pruning methods are tailored towards compressing specific models due to widely differing network architectures for distinct tasks. At the same time, it is desirable for pruning algorithms to generate optimal subnetworks according to user-specified parameter budgets. In this work, we propose Pruning via Ranking (PvR), a novel, global structured pruning approach which generates dense sub-networks that comply with any user-supplied parameter budget. PvR consists of a grouping module and a ranking module that are used to generate smaller networks in terms of both function composition as well as network width for a given dataset. The smaller networks are then trained from scratch instead of being fine-tuned as we empirically demonstrate using a recently proposed model complexity measure that re-initialization after pruning followed by re-training results in better performance. We compare our method against multiple pruning approaches on benchmark datasets, namely, CIFAR10, Tiny ImageNet and IMDB 50K movie reviews, with standard models, namely, VGG16, ResNet34 and Bert-base-uncased. We use both accuracy and model inference latency metrics to evaluate the performance of each approach. The smaller networks proposed by PvR for a range of parameter budgets when trained from scratch outperform all other methods across all datasets and models. In fact, our recommended sub-networks with fewer layers achieve less than $1$\% test accuracy drop even after pruning $90$\% of the original model across all networks and datasets while enjoying lower inference latency due to reduced depth.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7664
Loading