Broad Adversarial Training with Data Augmentation in the Output Space

Nils Worzyk; Stella Yu

Broad Adversarial Training with Data Augmentation in the Output Space

Nils Worzyk, Stella Yu

Published: 02 Dec 2021, Last Modified: 05 May 2023AAAI-22 AdvML Workshop LongPaperReaders: Everyone

Keywords: Adversarial Training, Adversarial, Adversarial Defence, Output Space

TL;DR: Increasing the efficiency of adversarial training by data augmentation in the output space

Abstract: In image classification, data augmentation and the usage of additional data has been shown to increase the efficiency of clean training and the accuracy of the resulting model. However, this does not prevent models from being fooled by adversarial manipulations. To increase the robustness, Adversarial Training (AT) is an easy, yet effective and widely used method to harden neural networks against adversarial inputs. Still, AT is computationally expensive and only creates one adversarial input per sample of the current batch. We propose Broad Adversarial Training (B-AT), which combines adversarial training and data augmentation in the decision space, i.e., on the models output vector. By adding random noise to the original adversarial output vector, we create multiple pseudo adversarial instances, thus increasing the data pool for adversarial training. We show that this general idea is applicable to two different learning paradigms, i.e., supervised and self-supervised learning. Using B-AT instead of AT for supervised learning, we can increase the robustness by 0.56\% for small seen attacks. For medium and larger seen attacks, the robustness increases by 4.57\% and 1.11\%, respectively. On large unseen attack, we can also report an increase in the robustness by 1.11\% and 0.29\%. When combining a larger corpus of input data with our proposed method, we report a slight increase of the clean accuracy and increased robustness against all observed attacks, compared to AT. In self-supervised training, we monitor a similar increase in robust accuracy for seen attacks and large unseen attacks, when it comes to the downstream task of image classification. In addition, for both observed self-supervised models, the clean accuracy also increases by up to 1.37\% using our method.

2 Replies

Loading