Less Precise Can Be More Reliable: A Systematic Evaluation of Quantization’s Impact on VLMs Beyond Accuracy
TL;DR: A large-scale analysis that suggests that quantization can unexpectedly improve VLM reliability by filtering out fine-grained features, indicating that lower precision does not always equate to lower quality.
Abstract: Vision-Language Models (VLMs) such as CLIP have revolutionized zero-shot classification and safety-critical tasks, including Out-of-Distribution (OOD) detection. However, their high computational cost hinders efficient real-world deployment. While quantization is a standard solution for efficiency, its broader impact on reliability metrics beyond simple Top-1 accuracy remains critically under-explored. In this study, we conduct a large-scale evaluation of VLM quantization across a comprehensive experimental suite of over 700k evaluation runs with varying configurations. We find that, contrary to the assumption that quantization's noise degrades performance, it can simultaneously improve accuracy, calibration, OOD detection, and robustness to noise, though not to covariate shift or spurious correlations. We leverage these counterintuitive findings to characterize the mechanics of quantization beyond simple regularization: we show that quantization dampens high-rank spectral components, compelling the model to rely more heavily on robust, low-rank features. Ultimately, this spectral filtering effect drives the observed improvements in generalization and noise tolerance, establishing a pathway to deploy faster, more reliable VLMs by utilizing quantization beyond its conventional role.
Lay Summary: The AI systems that link images and language (VLMs) are powerful but expensive to run. A standard way to shrink them, called quantization, stores their internal numbers with less precision (like rounding to fewer decimals). The conventional view is that this rounding slightly degrades performance, a price worth paying for speed.
We tested that view at scale, with over 700,000 evaluations, and found it can have the opposite effect. Compressed models frequently outperform their originals: they are more accurate, more honest about their own uncertainty, better at flagging unfamiliar inputs they shouldn't trust, and more tolerant of noisy images. They do not, however, fix biases inherited from the training data and often worsen them.
The reason for this is a characteristic filtering. A model's knowledge mixes broad, sturdy patterns with a long tail of brittle details, and the brittle details include much of what the model has overfit to. Quantization preferentially erases the fragile details while leaving the dominant patterns intact, like a slight blur that makes a photograph's main subject easier to recognize, while at the same time erasing small and fine details. With the right approach and use case, compression can yield systems that are simultaneously cheaper to run and more reliable.
Originally Submitted Supplementary Material: zip
Link To Code: https://github.com/CEA-LIST/less-precise-more-reliable-vlms
Primary Area: Deep Learning->Robustness
Keywords: Quantization, VLMs, Calibration, Uncertainty, Zero-Shot, Robustness, Computer Vision, Spurious Correlations
Originally Submitted PDF: pdf
Submission Number: 16536
Loading