ARC: Adaptive Rounding and Clipping Considering Gradient Distribution for Deep Convolutional Neural Network Training

Dahun Choi, Hyun Kim

Published: 01 Jan 2024, Last Modified: 02 Aug 2025ISCAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In convolution neural networks (CNNs), quantization is an effective compression method that can conserve hardware resources in convolution operations, which account for the majority of computations, with lower bits. Most quantization studies focused on weight and activation parameters. However, gradient quantization, which is the core of quantization research for CNN training, has a significant impact on network training even with small changes in gradient, making it difficult to achieve excellent accuracy. Although previous works for gradient quantization achieved high accuracy based on stochastic rounding (SR), there are issues of high latency in generating random numbers and difficulty with register transfer level (RTL)-based hardware design. Additionally, the search for a clipping value based on quantization error is effective in the initial training but becomes inadequate after model convergence as the quantization error decreases. In this paper, we address the limitations of SR through an approach based on deterministic rounding, specifically rounding toward zero (RTZ). Additionally, to determine the outliers in a distribution, we search for a suitable clipping value based on the z-score, that would be appropriate even if the network converges. Experimental results show that the proposed method achieves higher accuracy in various vision tasks, such as ResNet, YOLOv5, and YOLACT, and offers robust compatibility. The proposed quantizer was verified through RTL, achieving an accuracy similar to that of SR using resources comparable to nearest rounding. Moreover, when measuring latency on the CPU, the proposed method achieved 43% less latency than SR.