Fortify the Guardian, Not the Treasure: Resilient Adversarial Detectors

TMLR Paper2573 Authors

23 Apr 2024 (modified: 30 Jun 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper presents RADAR---Robust Adversarial Detection via Adversarial Retraining---an approach designed to enhance the robustness of adversarial detectors against adaptive attacks, while maintaining classifier performance. An adaptive attack is one where the attacker is aware of the defenses and adapts their strategy accordingly. Our proposed method leverages adversarial training to reinforce the ability to detect attacks, without compromising clean accuracy. During the training phase, we integrate into the dataset adversarial examples, which were optimized to fool both the classifier and the adversarial detector, enabling the adversarial detector to learn and adapt to potential attack scenarios.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: * Ablation study has been added. * Comparison to other detectors has been added. * ImageNet1k subset has been added. * Classification accuracy table has been added. * Definition of K in Eq. 1 has been added. * Layout of section 5 has been fixed.
Assigned Action Editor: ~W_Ronny_Huang1
Submission Number: 2573
Loading