Network Pruning at Scale: A Discrete Optimization Approach

Wenyu Chen; Riade Benbaki; Xiang Meng; Rahul Mazumder

Network Pruning at Scale: A Discrete Optimization Approach

Wenyu Chen, Riade Benbaki, Xiang Meng, Rahul Mazumder

Published: 23 Nov 2022, Last Modified: 05 May 2023OPT 2022 PosterReaders: Everyone

Abstract: Due to the ever-growing size of neural network models, there has been an emerging interest in compressing (i.e., pruning) neural networks by sparsifying weights in a pre-trained neural network, while maintaining the performance of dense model as much as possible. In this work, we focus on a neural network pruning framework based on local quadratic models of the loss function. We present an optimization-based approach with an $\ell_0$-regression formulation, and propose novel algorithms to obtain good solutions to the combinatorial optimization problem. In practice, our basic (single-stage) approach, based on one local quadratic model approximation, is up to $10^3$ times faster than existing methods while achieving similar accuracy. We also propose a multi-stage method that outperforms other methods in terms of accuracy for a given sparsity constraint while remaining computationally efficient. In particular, our approach results in a 98\% sparse (i.e., 98\% of weights in dense model are set to zero) MLPNet with 90\% test accuracy (i.e., 3\% reduction in test accuracy compared to the dense model), which is an improvement over the previous best accuracy (55\%).

0 Replies

Loading