Keywords: Post Training Quantization
Abstract: Post-training quantization (PTQ) is an effective technique for accelerating DNN model inference, where activations typically follow a bell-shaped distribution. Since commodity hardware employs a linear quantization grid and limited quantization levels, prior PTQs optimize a clipping threshold to minimize overall quantization error, which excludes outliers from the bell-shaped data. However, outliers are non-trivial for low-bit and lightweight models. Thus OCS (Zhao et al.,2019) proposed to save outliers by halving and duplicating. However, in activation quantization, the original OCS sacrifices the precision of the regular inliers, leading to severe accuracy degradation. To address this, we propose OCS+ to save outlier activation without affecting the regular inliers. Consequently, OCS+ theoretically achieves one-bit higher representation under the predefined bitwidth hardware. OCS+ is based on offline mathematical transformation, thus it does not require additional training or re-design works on hardware. Experiments over CNNs and ViTs demonstrate OCS+ significantly outperforms OCS and help improve current PTQ SOTAs, e.g., OCS+ improves the current SOTAs by 12.73\% in Acc@1 for W2A2 MobileNet-v2. The code will be released.
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4300
Loading