Improving the Generalization of Adversarial Training with Domain AdaptationDownload PDF

Published: 21 Dec 2018, Last Modified: 21 Apr 2024ICLR 2019 Conference Blind SubmissionReaders: Everyone
Abstract: By injecting adversarial examples into training data, adversarial training is promising for improving the robustness of deep learning models. However, most existing adversarial training approaches are based on a specific type of adversarial attack. It may not provide sufficiently representative samples from the adversarial domain, leading to a weak generalization ability on adversarial examples from other attacks. Moreover, during the adversarial training, adversarial perturbations on inputs are usually crafted by fast single-step adversaries so as to scale to large datasets. This work is mainly focused on the adversarial training yet efficient FGSM adversary. In this scenario, it is difficult to train a model with great generalization due to the lack of representative adversarial samples, aka the samples are unable to accurately reflect the adversarial domain. To alleviate this problem, we propose a novel Adversarial Training with Domain Adaptation (ATDA) method. Our intuition is to regard the adversarial training on FGSM adversary as a domain adaption task with limited number of target domain samples. The main idea is to learn a representation that is semantically meaningful and domain invariant on the clean domain as well as the adversarial domain. Empirical evaluations on Fashion-MNIST, SVHN, CIFAR-10 and CIFAR-100 demonstrate that ATDA can greatly improve the generalization of adversarial training and the smoothness of the learned models, and outperforms state-of-the-art methods on standard benchmark datasets. To show the transfer ability of our method, we also extend ATDA to the adversarial training on iterative attacks such as PGD-Adversial Training (PAT) and the defense performance is improved considerably.
Keywords: adversarial training, domain adaptation, adversarial example, deep learning
TL;DR: We propose a novel adversarial training with domain adaptation method that significantly improves the generalization ability on adversarial examples from different attacks.
Code: [![github](/images/github_icon.svg) JHL-HUST/ATDA](https://github.com/JHL-HUST/ATDA) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=SyfIfnC5Ym)
Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10), [CIFAR-100](https://paperswithcode.com/dataset/cifar-100), [Fashion-MNIST](https://paperswithcode.com/dataset/fashion-mnist), [SVHN](https://paperswithcode.com/dataset/svhn)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:1810.00740/code)
11 Replies

Loading