CAT: Post-Training Quantization Error Reduction via Cluster-based Affine Transformation

18 Sept 2025 (modified: 03 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Post-Training Quantization, Deep Learning, Image Classification, Convolutional Neural Networks
TL;DR: We develop a novel Error Reduction approach for Post-Training Quantization via Cluster-based Affine Transformation
Abstract: Post-Training Quantization (PTQ) reduces the memory footprint and computational overhead of deep neural networks by converting full-precision (FP) values into quantized and compressed data types. While PTQ is more cost-efficient than Quantization-Aware Training (QAT), it is highly susceptible to accuracy degradation under a low-bit quantization (LQ) regime (e.g., 2-bit and 4-bit). Affine transformation is a classical technique used to reduce the discrepancy between the information processed by a quantized model and that processed by its full-precision counterpart; however, we find that using plain affine transformation, which applies a uniform affine parameter set for all outputs, is ineffective in low-bit PTQ. To address this, we propose Cluster-based Affine Transformation (CAT), an error reduction framework that applies cluster-specific affine transformation to align LQ and FP outputs. CAT directly refines quantized outputs with only a negligible number of additional parameters. Experiments on ImageNet-1K demonstrate that CAT consistently outperforms prior PTQ methods across diverse architectures and low-bit settings, achieving up to 53.18\% Top-1 accuracy on W2A2 ResNet-18, and delivering improvements of more than 3\% when combined with strong PTQ baselines. We plan to release CAT’s code alongside the publication of this paper.
Supplementary Material: pdf
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 10973
Loading