HYPERPRUNING: EFFICIENT PRUNING THROUGH LYAPUNOV METRIC HYPERSEARCH

Yang Zheng; Eli Shlizerman

HYPERPRUNING: EFFICIENT PRUNING THROUGH LYAPUNOV METRIC HYPERSEARCH

Yang Zheng, Eli Shlizerman

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Network Pruning, Efficient Hyperparameter Searching, Lyapunov Spectrum

TL;DR: We proposed a novel method to search over pruning method and hyperparameters based on Lyapunov Spectrum.

Abstract: A variety of pruning methods have been introduced for over-parameterized recurrent neural networks to improve efficiency in terms of power and storage. With the advance in pruning methods and their variety, a new problem of ‘hyperpruning’ is becoming apparent: finding a suitable pruning method with optimal hyperparameter configuration for a particular task and network. Such search is different from the standard hyperparameter search, where the accuracy of the optimal configuration is unknown. In the context of network pruning, the accuracy of the non-pruned (dense) model sets the target for the accuracy of the pruned model. Thereby, the goal of hyperpruning is to reach or even surpass this target. It is critical to develop efficient strategies for hyperpruning since direct search through pruned variants would require time-consuming training without guarantees for improved performance. To address this problem, we introduce a novel distance based on Lyapunov Spectrum (LS) which provides means to compare pruned variants with the dense model and early in training to estimate the accuracy that pruned variants will achieve after extensive training. The ability to predict performance allows us to incorporate the LS-based distance with Bayesian hyperparameter optimization methods and to propose an efficient and first-of-its-kind hyperpruning approach called LS-based Hyperpruning (LSH) which can optimize the search time by an order of magnitude compared to standard full training search with the loss (or perplexity) being the accuracy metric. Our experiments on stacked LSTM and RHN language models trained with the Penn Treebank dataset show that with a given budget of training epochs and desired pruning ratio, LSH obtains more optimal variants than standard loss-based hyperparameter optimization methods. Furthermore, as a result of the search, LSH identifies pruned variants that outperform state-of-the-art pruning methods and surpass the accuracy of the dense model.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)

Supplementary Material: zip

9 Replies

Loading