Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution

Zhaoyang Zhang; Wenqi Shao; Jinwei Gu; Xiaogang Wang; Ping Luo

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution

Zhaoyang Zhang, Wenqi Shao, Jinwei Gu, Xiaogang Wang, Ping Luo

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Abstract: Model quantization is to discretize weights and activations of a deep neural network (DNN). Unlike previous methods that manually defined the quantization hyperparameters such as precision (\ie bitwidth), dynamic range (\ie minimum and maximum discrete values) and stepsize (\ie interval between discrete values), this work proposes a novel approach to differentiably learn all of them, named Differentiable Dynamic Quantization (DDQ), which possesses several appealing benefits. (1) Unlike previous works that applied the rounding operation to discretize values, DDQ provides a unified perspective by formulating discretization as a matrix-vector product, where different values of the matrix and vector represent different quantization methods such as mixed precision and soft quantization, and their values can be learned differentiably from training data, making different hidden layers in a DNN used different quantization methods. (2) DDQ is hardware-friendly, where all variables can be computed by using low-precision matrix-vector multiplication, making it capable in wide spectrum of hardwares. (3) The matrix variable is carefully reparameterized to reduce its number of parameters from O(2^{b^2}) to O(\log2^b), where b is the bit width. Extensive experiments show that DDQ outperforms prior arts on various advanced networks and benchmarks. For instance, compared to the full-precision models, MobileNetv2 trained with DDQ achieves comparable top1 accuracy on ImageNet (71.7% vs 71.9%), while ResNet18 trained with DDQ increases accuracy by 0.5%. These results relatively improve recent state-of-the-art quantization methods by 70% and 140% compared to the full-precision models.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=XPKYdzT5rF

6 Replies

Loading