Conflict-Aware Adversarial Training

Conflict-Aware Adversarial Training

TMLR Paper4053 Authors

25 Jan 2025 (modified: 09 May 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Adversarial training is the most effective method to obtain adversarial robustness for deep neural networks by directly involving adversarial samples in the training procedure. To obtain an accurate and robust model, the weighted-average method is applied to optimize standard loss and adversarial loss simultaneously. In this paper, we argue that the weighted-average method does not provide the best tradeoff for standard performance and adversarial robustness. We argue that the failure of the weighted-average method is due to the conflict between gradients derived from standard and adversarial loss, and further demonstrate such a conflict increases with attack budget theoretically and practically. To alleviate this problem, we propose a new trade-off paradigm for adversarial training with a conflict-aware factor for the convex combination of standard and adversarial loss, named Conflict-Aware Adversarial Training (CA-AT). Comprehensive experimental results show that CA-AT consistently offers a superior trade-off between standard performance and adversarial robustness under the settings of adversarial training from scratch and parameter-efficient finetuning.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Quanshi_Zhang1

Submission Number: 4053

Loading