Adaptive Quantization and Pruning of Deep Neural Networks via Layer Importance Estimation

Published: 09 Oct 2024, Last Modified: 19 Nov 2024Compression Workshop @ NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Image Classification, Model Compression, Adaptive Quantization, Adaptive Pruning, Layer Importance
Abstract: Deep neural networks (DNNs) have achieved remarkable performance in various applications, but their deployment on edge devices is hindered by significant computational and storage requirements. To mitigate these challenges, quantization has proven effective in reducing model size while maintaining accuracy, with pruning further enhancing model compression. However, achieving an optimal balance between compression and performance, particularly in mixed-precision strategies that allocate different bit widths to individual layers, remains a challenge. In this paper, we present a method that ranks layers based on their statistical importance and adaptively selects bit-width precision and pruning for each layer, ensuring minimal accuracy loss. Our approach dynamically determines layer-specific thresholds, optimizing compression without the need for complex tuning or costly optimization. We validate our interpretable and efficient method through image classification tasks, demonstrating its effectiveness across multiple DNN architectures. Experimental results show that our method maintains classification accuracy levels of 91.16\% for VGG19, 86.06\% for ResNet18, and 86.13\% for ResNet34 on CIFAR-10 dataset while achieving average bit-width reductions to 1.08, 2.66, and 2.42 bits, respectively.
Submission Number: 90
Loading