Training Deep Neural Networks with Joint Quantization and Pruning of Features and WeightsDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Deep Learning, Quantization, Pruning, Feature Sparsity, Model Compression, Unstructured Feature Pruning
Abstract: Quantization and pruning are widely used to reduce the inference costs of deep neural networks. In this work, we propose a framework to train deep neural networks using novel methods for uniform quantization and unstructured pruning on both the features and weights. We demonstrate that our method delivers an increased performance per memory footprint over existing state-of-the-art solutions. Using our framework, we empirically evaluate the prune-then-quantize paradigm and independence assumption across a wide range of computer vision tasks and observe the non-commutativity of quantization and pruning when applied to both features and weights.
One-sentence Summary: We propose a framework to train deep neural networks using novel methods for uniform quantization and unstructured pruning on both the features and weights.
Supplementary Material: zip
1 Reply

Loading