THE EFFECT OF ADVERSARIAL TRAINING: A THEORETICAL CHARACTERIZATION

Anonymous

Sep 25, 2019 ICLR 2020 Conference Blind Submission readers: everyone Show Bibtex
  • TL;DR: We prove adversarial training within linear classifier can rapidly converge to a robust solution. In addition, adversarial training is stable to outliers in dataset.
  • Abstract: It has widely shown that adversarial training (Madry et al., 2018) is effective in defending adversarial attack empirically. However, the theoretical understanding of the difference between the solution of adversarial training and that of standard training is limited. In this paper, we characterize the solution of adversarial training for linear classification problem for a full range of adversarial radius ". Specifically, we show that if the data themselves are ”-strongly linearly-separable”, adversarial training with radius smaller than " converges to the hard margin solution of SVM with a faster rate than standard training. If the data themselves are not ”-strongly linearly-separable”, we show that adversarial training with radius " is stable to outliers while standard training is not. Moreover, we prove that the classifier returned by adversarial training with a large radius " has low confidence in each data point. Experiments corroborate our theoretical finding well.
  • Keywords: adversarial training, robustness, separable data
0 Replies

Loading