Generalizing Robustness from $\ell_p$ to Unforeseen Attack via Calibrated Adversarial Sampling

Published: 29 Sept 2025, Last Modified: 12 Oct 2025NeurIPS 2025 - Reliable ML WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adversarial robustness
Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to various adversarial perturbations. To address the safety concerns arising from these vulnerabilities, adversarial training (AT) has emerged as one of the most effective paradigms for enhancing the robustness of DNNs. However, existing AT frameworks primarily focus on a single or a limited set of attack types, leaving DNNs still exposed to newly considered attack types that have not been addressed during training. In this paper, we explore a new robust generalization paradigm that fine-tunes robust DNNs to cope with unforeseen attacks. To this end, we propose Calibrated Adversarial Sampling (CAS), a method that dynamically adjusts sampling probabilities during fine-tuning to balance robustness across various adversarial attacks. CAS operates in three key phases: sample-wise robustness testing, warm-up fine-tuning, and dynamic fine-tuning. Experiments on benchmark datasets show that CAS achieves superior overall robustness, maintains clean accuracy, and effectively balances robustness across different types of attacks, providing a new paradigm for robust generalization of DNNs.
Submission Number: 9
Loading