DECOUPLE QUANTIZATION STEP AND OUTLIER-MIGRATED RECONSTRUCTION FOR PTQ

Zhaojing Wen; Qiulin Zhang; Yuan Zhang; Rudan Chen; Xichao Yang; Di Xie

DECOUPLE QUANTIZATION STEP AND OUTLIER-MIGRATED RECONSTRUCTION FOR PTQ

Zhaojing Wen, Qiulin Zhang, Yuan Zhang, Rudan Chen, Xichao Yang, Di Xie

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: post-training quantization，model compression

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: DOMR outperforms the current best method by 12.93% in Top-1 accuracy for W2A2 quantization on MobileNet-v2.

Abstract: Post-training quantization (PTQ) is a popular technique for compressing deep learning models due to its low cost and high efficiency. However, in some extremely low-bit settings, PTQ still suffers from significant performance degradation. In this work, we reveal two related obstacles: (1) the setting of weight's quantization step has not been fully explored, and (2) the outlier activation beyond clipping range are ignored in most methods, which is especially important for lightweight models and low-bit settings. To overcome these two obstacles, we propose \textbf{DOMR}, to (1) fully explore the setting of weight's quantization step into five cases through \textbf{D}ecoupling, based on the ignored fact that integer weight (different from integer activation) can be obtained early before actual inference deployment, (2) save outliers into the safe clipping range under predefined bitwidth with \textbf{O}utlier-\textbf{M}igrated \textbf{R}econstruction, based on the nature of CNN structure and PTQ's clipping operation. More outliers saved equals to breaking the bitwidth shackle of a predefined hardware thus brings better performance. Extensive experiments on various networks demonstrate that DOMR establishes a new SOTA in PTQ. Specifically, DOMR outperforms the current best method by 12.93\% in Top-1 accuracy for W2A2 on MobileNet-v2. The code will be released.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2422

Loading