Integral Pruning on Activations and Weights for Efficient Neural Networks

Qing Yang; Wei Wen; Zuoguan Wang; Yiran Chen; Hai Li

Integral Pruning on Activations and Weights for Efficient Neural Networks

Qing Yang, Wei Wen, Zuoguan Wang, Yiran Chen, Hai Li

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: With the rapidly scaling up of deep neural networks (DNNs), extensive research studies on network model compression such as weight pruning have been performed for efficient deployment. This work aims to advance the compression beyond the weights to the activations of DNNs. We propose the Integral Pruning (IP) technique which integrates the activation pruning with the weight pruning. Through the learning on the different importance of neuron responses and connections, the generated network, namely IPnet, balances the sparsity between activations and weights and therefore further improves execution efficiency. The feasibility and effectiveness of IPnet are thoroughly evaluated through various network models with different activation functions and on different datasets. With <0.5% disturbance on the testing accuracy, IPnet saves 71.1% ~ 96.35% of computation cost, compared to the original dense models with up to 5.8x and 10x reductions in activation and weight numbers, respectively.

Keywords: activation pruning, weight pruning, computation cost reduction, efficient DNNs

TL;DR: This work advances DNN compression beyond the weights to the activations by integrating the activation pruning with the weight pruning.

8 Replies

Loading