What Makes a Good Prune? Maximal Unstructured Pruning for Maximal Cosine Similarity

Gabryel Mason-Williams; Fredrik Dahlqvist

What Makes a Good Prune? Maximal Unstructured Pruning for Maximal Cosine Similarity

Gabryel Mason-Williams, Fredrik Dahlqvist

Published: 16 Jan 2024, Last Modified: 21 Apr 2024ICLR 2024 posterEveryoneRevisionsBibTeX

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Pruning, Deep Learning, Neural Networks, Interpretability, Loss landscapes, Optimization, Kurtosis

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: This paper explores the mechanisms that underpin pruning, ultimately showing that the higher the kurtosis of a model's parameter distribution, the more it can be pruned while maintaining performance.

Abstract: Pruning is an effective method to reduce the size of deep neural network models, maintain accuracy, and, in some cases, improve the network's overall performance. However, the mechanisms underpinning pruning remain unclear. Why can different methods prune by different percentages yet achieve similar performance? Why can we not prune at the start of training? Why are some models more amenable to being pruned than others? Given a model, what is the maximum amount it can be pruned before significantly affecting the performance? This paper explores and answers these questions from the global unstructured magnitude pruning perspective with one epoch of fine-tuning. We develop the idea that cosine similarity is an effective proxy measure for functional similarity between the parent and the pruned network. We prove that the L1 pruning method is optimal when pruning by cosine similarity. We show that the higher the kurtosis of a model's parameter distribution, the more it can be pruned while maintaining performance. Finally, we present a simple method to determine the optimal amount by which a network can be L1-pruned based on its parameter distribution. The code demonstrating the method is available at https://github.com/gmw99/what_makes_a_good_prune

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Primary Area: general machine learning (i.e., none of the above)

Submission Number: 7800

Loading