On Adversarial Training without Perturbing all Examples

11 May 2023 (modified: 12 Dec 2023)Submitted to NeurIPS 2023EveryoneRevisionsBibTeX
Keywords: adversarial robustness, adversarial training, adversarial robust transfer
TL;DR: We propose a subset adv. training experiment to investigate robustness interactions between classes, examples and tasks and find that few attacked examples generalise surprisingly well
Abstract: Adversarial training is the de-facto standard for improving robustness against adversarial examples. This usually involves a multi-step adversarial attack applied on each example during training. In this paper, we explore only constructing adversarial examples (AE) on a subset of the training examples. That is, we split the training set in two subsets $A$ and $B$, train models on both ($A\cup B$) but construct AEs only for examples in $A$. Starting with $A$ containing only a single class, we systematically increase the size of $A$ and consider splitting by class and by examples. We observe that: (i) adv. robustness transfers by difficulty and to classes in $B$ that have never been adv. attacked during training, (ii) we observe a tendency for hard examples to provide better robustness transfer than easy examples, yet find this tendency to diminish with increasing complexity of datasets (iii) generating AEs on only $50$% of training data is sufficient to recover most of the baseline AT performance even on ImageNet. We observe similar transfer properties across tasks, where generating AEs on only $30$% of data can recover baseline robustness on the target task. We evaluate our subset analysis on a wide variety of image datasets like CIFAR-10, CIFAR-100, ImageNet-200 and show transfer to SVHN, Oxford-Flowers-102 and Caltech-256. In contrast to conventional practice, our experiments indicate that the utility of computing AEs varies by class and examples and that weighting examples from $A$ higher than $B$ provides high transfer performance.
Supplementary Material: pdf
Submission Number: 15141