Keywords: Adversarial training, Entropy weighting
TL;DR: We propose an instance-wise entropy-weighted adversarial training to focus on more uncertain examples.
Abstract: Adversarial training, which minimizes the loss of adversarially-perturbed training examples, has been extensively studied as a solution to improve the robustness of deep neural networks. However, most adversarial training methods treat all training examples equally, while each example may have a different impact on the model's robustness during the course of adversarial training. A couple of recent works have exploited such unequal importance of adversarial samples to the model's robustness by proposing to assign more weights to the misclassified samples or to the samples that violate the margin more severely, which have been shown to obtain high robustness against untargeted PGD attacks. However, we empirically find that they make the feature spaces of adversarial samples across different classes overlap and thus yield more high-entropy samples whose labels could be easily flipped. This makes them more vulnerable to adversarial perturbations, and their seemingly good robustness against PGD attacks is actually achieved by a false sense of robustness. To address such limitations, we propose simple yet effective re-weighting scheme that weighs the loss for each adversarial training example proportionally to the entropy of its predicted distribution to focus on examples whose labels are more uncertain.