QAT-SAM: Accurate Quantization for Segment Anything Model 2

TMLR Paper7986 Authors

18 Mar 2026 (modified: 07 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The Segment Anything Model 2 (SAM2) is a powerful foundation model for promptable segmentation. However, its high computational and memory costs are a major barrier to deployment on resource-constrained devices. In this paper, we present QAT-SAM, a low-bit quantization method that substantially improves robustness over prior QAT baselines at extreme bit-widths while delivering large model-size reductions. To address performance degradation arising from challenging weight and activation distributions during quantization, QAT-SAM introduces two novel contributions: Variance-Reduced Calibration (VRC), an initialization method that reduces weight statistical variance by minimizing the Frobenius norm over a small calibration batch; and Learnable Statistical Clipping (LSC), a Quantization-Aware Training (QAT) method that learns momentum-stabilized clipping factors to manage outliers in weights and activations. Comprehensive experiments demonstrate that QAT-SAM substantially closes the QAT accuracy gap to full precision at low bit-widths and significantly outperforms prior QAT baselines, particularly in the ultra-low 2-bit regime. Specifically, QAT-SAM achieves an accuracy gain of up to 9.7 ppt in J\&F on the video segmentation benchmark and 7.3 ppt in mIoU for instance segmentation over the best competing QAT model, all while achieving an 8x reduction in model size compared to the BF16 baseline.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Fanhua_Shang2
Submission Number: 7986
Loading