Quantization-Aware Training With Dynamic and Static Pruning

Sangho An, Jongyun Shin, Jangho Kim

Published: 01 Jan 2025, Last Modified: 15 May 2025IEEE Access 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The evolution of deep neural networks (DNNs) naturally leads to an increase in model size. This necessitates various model compression techniques, such as pruning and quantization, to reduce memory usage and power consumption. In particular, combining these compression techniques can achieve significant cost savings. However, we found that methods using dynamic pruning and quantization suffer from instability in training and poor generalization performance due to the effects of the two Straight Through Estimators (STE). To address this problem, we propose a Quantization-aware training with Dynamic and Static pruning (QADS) method that takes advantage of both pruning and quantization by performing STE operations only during quantization from a certain point in time. In our experiments, the proposed method exhibits more stable training compared to existing techniques and achieves performance improvements on the CIFAR-10/100, ImageNet, and Google Speech Command datasets. The code is provided at https://github.com/Ahnho/Quantization-aware-training-with-Dynamic-and-Static-Pruning.