CL-Calib: Enhancing Post-training Quantization Calibration through Contrastive Learning

Yuzhang Shang; Gaowen Liu; Ramana Rao Kompella; Yan Yan

CL-Calib: Enhancing Post-training Quantization Calibration through Contrastive Learning

Yuzhang Shang, Gaowen Liu, Ramana Rao Kompella, Yan Yan

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: network compression

Abstract: Post-training quantization (PTQ) converts a pre-trained full-precision (FP) model into a quantized model in a training-free manner. Determining suitable quantization parameters, such as scaling factors and weight rounding, is the primary strategy for mitigating the impact of quantization noise (calibration) and restoring the performance of the quantized models. However, the existing activation calibration methods have never considered information degradation between pre- (FP) and post-quantized activations. In this study, we introduce a well-defined distributional metric from information theory, mutual information, into PTQ calibration. We aim to calibrate the quantized activations by maximizing the mutual information between the pre- and post-quantized activations. To realize this goal, we establish a contrastive learning (CL) framework for the PTQ calibration, where the quantization parameters are optimized through a self-supervised proxy task. Specifically, by leveraging CL during the PTQ process, we can benefit from pulling the positive pairs of quantized and FP activations collected from the same input samples, while pushing negative pairs from different samples. Thanks to the ingeniously designed critic function, we avoid the unwanted but often-encountered collision solution in CL, especially in calibration scenarios where the amount of calibration data is limited. Additionally, we provide a theoretical guarantee that minimizing our designed loss is equivalent to maximizing the desired mutual information. Consequently, the quantized activations retain more information, which ultimately enhances the performance of the quantized network. Experimental results show that our method can effectively serve as an add-on module to existing SoTA PTQ methods.

Supplementary Material: pdf

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7689

Loading