Uncertainty estimation in classification via weighted test-time augmentation

TMLR Paper6048 Authors

30 Sept 2025 (modified: 20 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In classification, deep learning models are considered superior in terms of prediction accuracy compared to standard statistical models. However, these models are often overconfident in their predictions, which inhibits their use in safety critical applications where mistakes can lead to disastrous consequences. To address this issue, several uncertainty quantification methods have been proposed to introduce more reliable predictions and better calibration. In this paper, we focus on a universal uncertainty quantification method called test-time augmentation (TTA). We then present a weighted version of test-time augmentation (WTTA) that introduces weights to the algorithm to generate better augmentations. Our approach is then illustrated with various models and data sets. In a simulation study, we show that the WTTA approach produces better uncertainty estimates as we are able to compare to the real uncertainties in the data. Furthermore, the method is applied to two benchmark data sets used in the development of machine learning models. Although, it is a rather simple post-process method, WTTA is arguably able to outperform the standard TTA and temperature scaling methods in terms of calibration error and prediction accuracy, especially in small data sets.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Kamalika_Chaudhuri1
Submission Number: 6048
Loading