Improving Vision Model Robustness against Misclassification and Uncertainty Attacks via Underconfidence Adversarial Training

Josué Martínez-Martínez; John T Holodnak; Olivia Brown; Sheida Nabavi; Derek Aguiar; Allan Wollaber

Improving Vision Model Robustness against Misclassification and Uncertainty Attacks via Underconfidence Adversarial Training

Josué Martínez-Martínez, John T Holodnak, Olivia Brown, Sheida Nabavi, Derek Aguiar, Allan Wollaber

Published: 05 Nov 2025, Last Modified: 01 Dec 2025NLDL 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: adversarial training, uncertainty attacks, adversarial attacks, robustness, confidence manipulation, underconfidence attack, miscalibration, AI security

TL;DR: This work extends adversarial robustness to underconfidence attacks, proposing two novel attacks and a defense that improves robustness while using half the steps of standard adversarial training.

Abstract: Adversarial robustness research has focused on defending against misclassification attacks. However, such adversarially trained models remain vulnerable to underconfidence adversarial attacks, which reduce the model’s confidence without changing the predicted class. Decreased confidence can result in unnecessary interventions, delayed diagnoses, and a weakening of trust in automated systems. In this work, we introduce two novel underconfidence attacks: one that induces ambiguity between a class pair, and ConfSmooth which spreads uncertainty across all classes. For defense, we propose Underconfidence Adversarial Training (UAT) that embeds our underconfidence attacks in an adversarial training framework. We extensively benchmark our underconfidence attacks and defense strategies across six model architectures (both CNN and ViT-based), and seven datasets (MNIST, CIFAR, ImageNet, MSTAR and medical imaging). In 14 of the 15 data-architecture combinations, our attack outperforms the state-of-the-art, often substantially. Our UAT defense maintains the highest robustness against all underconfidence attacks on CIFAR-10, and achieves comparable to or better robustness than adversarial training against misclassification attacks while taking half of the gradient steps. By broadening the scope of adversarial robustness to include uncertainty-aware threats and defenses, UAT enables more robust computer vision systems.

Serve As Reviewer: ~Josué_Martínez-Martínez1

Submission Number: 35

Loading