Adaptive Rounding Compensation for Post-training Quantization

Published: 2022, Last Modified: 22 Jan 2026ICONIP (5) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Network quantization can compress and accelerate deep neural networks by reducing the bit-width of network parameters so that the quantized networks can be deployed to resource-limited devices. Post-Training Quantization (PTQ) is a practical method of generating a hardware-friendly quantized network without re-training or fine-tuning. However, PTQ results in unacceptable accuracy degradation due to disturbance caused by clipping and discarding the rounded remains. To address this problem, we propose Adaptive Rounding Compensation Quantization (ARCQ) to reduce the quantization errors by utilizing the rounded remains and clipping threshold that can be computed in resource-limited devices. Moreover, to leverage accuracy and speed, we propose a dynamic compensation method to select critical layers to be compensated in terms of parameters and quantization errors. Extensive experiments verify that our method can achieve superior results on ImageNet for classification and MSCOCO for object detection. Codes are available at https://github.com/Iconip2022/ARCQ.
Loading