Pruning vs Quantization: Which is Better?

Andrey Kuzmin; Markus Nagel; Mart Van Baalen; Arash Behboodi; Tijmen Blankevoort

Pruning vs Quantization: Which is Better?

Andrey Kuzmin, Markus Nagel, Mart Van Baalen, Arash Behboodi, Tijmen Blankevoort

Published: 21 Sept 2023, Last Modified: 10 Jan 2024NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: Neural network quantization, neural network pruning, magnitude pruning, post-training quantization, quantization-aware training

TL;DR: An extensive comparison between quantization and pruning containing analytical error computations for data distributions and error lower bounds for post-training scenarios

Abstract: Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date, only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question of which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower and upper bounds for the per-layer pruning and quantization error in trained networks and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models trained on 3 tasks and provide insights into the representations learned during fine-tuning with quantization and pruning in the loop. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with a very high compression ratio, compression might be beneficial from an accuracy standpoint.

Supplementary Material: pdf

Submission Number: 13436

Loading