A Fast Framework for Post-training Structured Pruning Without Retraining

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Model Compression, Structured pruning, Limited data
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We proposed a post-training structured pruning framework that does not require any retraining.
Abstract: Pruning has become a widely adopted technique for compressing and accelerating deep neural networks. However, most pruning approaches rely on lengthy retraining procedures to restore performance, rendering them impractical in many real-world settings where data privacy regulations or computational constraints prohibit extensive retraining. To address this limitation, we propose a novel framework for rapidly pruning pre-trained models without any retraining. Our framework focuses on structured pruning. It first groups coupled structures across layers based on their dependencies, and comprehensively measures and removes the least important channels in a group. Then we introduce a two-phase layer reconstruction strategy utilizing a small amount of unlabeled data to recover the accuracy drop induced by pruning. The first phase imposes a sparsity penalty on less important channels to squeeze information into the remaining components before pruning. The second phase executes pruning and calibrates the layer output discrepancy between the pruned and original models to reconstruct the output signal. Experiments demonstrate that our framework achieves significant improvements over retraining-free methods and matches the accuracy of pruning approaches that require expensive retraining. With access to about 0.2\% samples from the ImageNet training set, our method achieves up to 1.73x reduction in FLOPs, while maintaining 72.58\% accuracy for ResNet-50. Notably, our framework prunes networks within a few minutes on a single GPU, which is orders of magnitude faster than retraining-based techniques.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2215
Loading