SAMPQ: Saliency-aware Mixed-Precision Quantization

SAMPQ: Saliency-aware Mixed-Precision Quantization

ICLR 2026 Conference Submission13300 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mixed-precision quantization, Saliency detection, Fine-grained optimization

TL;DR: A novel sample-aware mixed-precision quantization method.

Abstract: Although mixed-precision quantization (MPQ) achieves a remarkable accuracy-complexity trade-off, conventional gradient-based MPQ methods are susceptible to input noise, which leads to suboptimal bit-width allocation strategies. Through saliency analysis, we indicate that treating sample feature regions as equally significant exacerbates the quantization error in MPQ. To mitigate this issue, we propose saliency-aware MPQ (SAMPQ), a novel framework designed to dynamically evaluate the sample saliency. In particular, SAMPQ is formulated as a three-stage cascade-optimized training procedure. At the first stage, the neural network (NN) weights are trained on vanilla samples with its bit-width configuration tentatively fixed. At the second stage, saliency maps are generated by one-step optimized weights. At the third stage, the bit-width allocation is optimized on saliency-reweighted samples while freezing NN weights. By iteratively alternating these optimization phases, SAMPQ enables the quantized NN modules to focus on fine-grained features. Experiments conducted on the benchmark demonstrate the effectiveness of our proposed method within existing MPQ frameworks.

Primary Area: optimization

Submission Number: 13300

Loading