Instance adaptive adversarial training: Improved accuracy tradeoffs in neural nets

Yogesh Balaji; Tom Goldstein; Judy Hoffman

Instance adaptive adversarial training: Improved accuracy tradeoffs in neural nets

Yogesh Balaji, Tom Goldstein, Judy Hoffman

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Adversarial training, Improving generalization, robustness-accuracy tradeoff

TL;DR: Instance adaptive adversarial training for improving robustness-accuracy tradeoff

Abstract: Adversarial training is by far the most successful strategy for improving robustness of neural networks to adversarial attacks. Despite its success as a defense mechanism, adversarial training fails to generalize well to unperturbed test set. We hypothesize that this poor generalization is a consequence of adversarial training with uniform perturbation radius around every training sample. Samples close to decision boundary can be morphed into a different class under a small perturbation budget, and enforcing large margins around these samples produce poor decision boundaries that generalize poorly. Motivated by this hypothesis, we propose instance adaptive adversarial training -- a technique that enforces sample-specific perturbation margins around every training sample. We show that using our approach, test accuracy on unperturbed samples improve with a marginal drop in robustness. Extensive experiments on CIFAR-10, CIFAR-100 and Imagenet datasets demonstrate the effectiveness of our proposed approach.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/instance-adaptive-adversarial-training/code)

Original Pdf: pdf

9 Replies

Loading