Abstract: Adversarial training (AT) is widely regarded as a leading defense strategy for improving the robustness of deep learning models against adversarial attacks. However, existing AT methods often rely on a single attack strategy during training, which limits the exploration of the perturbation space and leads to poor generalization robustness against stronger, unseen, or adaptive adversarial attacks. Moreover, most AT approaches overlook class-wise robustness–the observed variation in robustness across different image classes–by focusing solely on average performance over the entire dataset. In this paper, we present Advanced Distributional Training with Class-wise Robustness (ADT++), a novel adversarial training framework that significantly improves generalization robustness against unseen and sophisticated adversarial attacks. Following the standard adversarial training framework, ADT++ is formulated as a minmax optimization problem, where the inner maximization aims to learn the worst-case adversarial distribution around adversarial examples to further explore the perturbation space. The outer minimization seeks to find model parameters that minimize the expected loss of the maximum inner loss. To further improve the generalization robustness, ADT++ leverages the class-wise robustness phenomenon by targeting the most vulnerable image classes with high-loss adversarial attacks to generate more impactful adversarial examples. Extensive evaluations on benchmark datasets and against various AT defense methods and adversarial attacks confirm the effectiveness of ADT++ in improving model robustness against stronger and adaptive attacks. The source code of ADT++ can be found.11https://github.com/LAiSR-SK/ADT2Plus
External IDs:dblp:conf/dsaa/KhamaisehJAAJ25
Loading