Keywords: Adversarial training, Entropy weighting
TL;DR: We propose an instance-wise entropy-weighted adversarial training to focus on more uncertain examples.
Abstract: Adversarial training methods, which minimizes the loss of adversarially-perturbed training examples, have been extensively studied as a solution to improve the robustness of the deep neural networks. However, most adversarial training methods treat all training examples equally, while each example may have a different impact on the model's robustness during the course of training. Recent works have exploited such unequal importance of adversarial samples to model's robustness, which has been shown to obtain high robustness against untargeted PGD attacks. However, we empirically observe that they make the feature spaces of adversarial samples across different classes overlap, and thus yield more high-entropy samples whose labels could be easily flipped. This makes them more vulnerable to targeted adversarial perturbations. Moreover, to address such limitations, we propose a simple yet effective weighting scheme, Entropy-Weighted Adversarial Training (EWAT), which weighs the loss for each adversarial training example proportionally to the entropy of its predicted distribution, to focus on examples whose labels are more uncertain. We validate our method on multiple benchmark datasets and show that it achieves an impressive increase of robust accuracy.