BLOB-Q: Boosting Low Bit ViT Quantization via Global Optimization on Model Distortion

BLOB-Q: Boosting Low Bit ViT Quantization via Global Optimization on Model Distortion

ICLR 2026 Conference Submission15984 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Efficient Inference, Vision Transformer Quantization, Model Compression

Abstract: In this paper, we present a novel Mixed-Precision Post Training Quantization (PTQ) approach for Vision Transformers (ViTs). Our approach aims to minimize the output distortion caused by quantization, and thus can maximally maintain the accuracy of ViT models even quantized to low bit widths. Different with prior works which typically optimize the output error of current layer (layer distortion), when performing quantization, our approach directly minimizes the output error of the last layer of the model (model distortion). As model distortion is highly related to accuracy, our approach can maximally maintain the accuracy even when quantized to low bit widths. We formulate the quantization of ViTs as a model distortion optimization problem, given the constraint of size. By solving the optimization problem, the optimal bit allocation across layers, i.e., the optimal bit width of each layer, can be obtained, with minimized model distortion. Directly solving the optimization problem is an NP-hard problem. We propose to adopt the second-order term of the Taylor series expansion to approximate model distortion, where an important additivity property can be derived under the approximation. Utilizing the second-order additivity property, the optimization problem can be decomposed into sub-problems and solved efficiently in an iterative manner. Specifically, we propose a dynamic programming algorithm to solve the optimization problem and efficiently find the globally optimal solution with only linear time complexity. Extensive experiments on six ViT models demonstrate the effectiveness of our approach. Results show that our approach significantly improves state-of-the-art and can further reduce the size of ViT models to 4 bits to 6 bits without hurting accuracy.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 15984

Loading