Precision-Induced Miscalibration: Understanding and Correcting Confidence Distortion in Quantized Neural Networks

Jiawei Gu; Fengyuan Nie; Hao Tang; Yanpeng Sun

Precision-Induced Miscalibration: Understanding and Correcting Confidence Distortion in Quantized Neural Networks

Jiawei Gu, Fengyuan Nie, Hao Tang, Yanpeng Sun

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Low-precision arithmetic is pervasive in neural network training and deployment, yet its effect on prediction \textit{confidence}, not just accuracy, remains unexamined. We show that the softmax function amplifies logit-space quantization errors in an input-dependent manner: confidence distortion scales with the product of precision-dependent error bound $\epsilon$ and logit norm, peaking when the model is confident but not saturated. This explains why identical models report different confidence values across precisions, a phenomenon we term \textit{Precision Split}. During training, the same mechanism causes gradient underflow: when logit margins exceed a precision-dependent threshold, gradients vanish and samples silently stop contributing to learning. Since logit norm serves as a computable proxy for precision-induced risk, we propose Precision-Aware Confidence Scaling (PACS), which applies sample-adaptive temperature inversely related to this risk, with sub-one-percent overhead and no full-precision computation required. On ImageNet with mixed-precision ResNet-50, PACS reduces Expected Calibration Error from 5.82\% to 1.92\% while maintaining accuracy, with consistent improvements across architectures, precision formats, and modalities.

Lay Summary: Modern AI models often run on "low-precision" math, which saves energy and memory and makes them much faster. But we found a hidden cost: low precision quietly makes a model too sure of its answers. The model still picks the right answer, but it reports a confidence value that is too high. This matters when AI is used in cars, hospitals, or banks, where trusting a wrong "I am sure" can be dangerous. We traced this problem to a simple cause in the hardware math, and because we understand the cause, we can predict which answers are at risk and gently fix them. Our method, called PACS, does this in a few tiny steps, with no extra training data and almost no extra time. As a result, the model's confidence becomes much more honest, while its accuracy stays the same.

Originally Submitted Supplementary Material: pdf

Link To Code: https://drive.google.com/file/d/1TKLAlXrxwDlgNREa9VYU9xXPMDwUMDTZ/view?usp=drive_link

Primary Area: Applications->Everything Else

Keywords: Model calibration

Originally Submitted PDF: pdf

Submission Number: 1849

Loading