Abstract: Adversarial training has proven to be one of the most effective methods to defend against
adversarial attacks. Nevertheless, robust overfitting is a common obstacle in adversarial
training of deep networks. There is a common belief that the features learned by different
network layers have different properties, however, existing works generally investigate robust
overfitting by considering a DNN as a single unit and hence the impact of different network
layers on robust overfitting remains unclear. In this work, we divide a DNN into a series of
layers and investigate the effect of different network layers on robust overfitting. We find
that different layers exhibit distinct properties towards robust overfitting, and in particular,
robust overfitting is mostly related to the optimization of latter parts of the network. Based
upon the observed effect, we propose a robust adversarial training (RAT) prototype: in
a minibatch, we optimize the front parts of the network as usual, and adopt additional
measures to regularize the optimization of the latter parts. Based on the prototype, we
designed two realizations of RAT, and extensive experiments demonstrate that RAT can
eliminate robust overfitting and boost adversarial robustness over the standard adversarial
training
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Mathieu_Salzmann1
Submission Number: 935
Loading