Abstract: Adversarial training has proven to be one of the most effective methods to defend against adversarial attacks. Nevertheless, robust overfitting is a common obstacle in adversarial training of deep networks. There is a common belief that the features learned by different network layers have different properties, however, existing works generally investigate robust overfitting by considering a DNN as a single unit and hence the impact of different network layers on robust overfitting remains unclear. In this work, we divide a DNN into a series of layers and investigate the effect of different network layers on robust overfitting. We find that different layers exhibit distinct properties towards robust overfitting, and in particular, robust overfitting is mostly related to the optimization of latter parts of the network. Based upon the observed effect, we propose a \emph{robust adversarial training} (RAT) prototype: in a minibatch, we optimize the front parts of the network as usual, and adopt additional measures to regularize the optimization of the latter parts. Based on the prototype, we designed two realizations of RAT, and extensive experiments demonstrate that RAT can eliminate robust overfitting and boost adversarial robustness over the standard adversarial training.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=BaoCnmosJz&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: In this new version, our modifications are summarized below:
- Analyse and validate the layer-wise properties of robust overfitting on more diverse network architectures, including PreAct ResNet-18, PreAct ResNet-34, VGG-16, DPN-26 and DLA. The corresponding experimental results are provided in Figure 9.
- Investigate the layer-wise properties of robust overfitting with different adversarial training methods, including standard AT and TRADES. The corresponding experimental results are provided in Figure 10.
- The presentation of the paper has been improved, including but not limited to:
- Correct the citation in the main content;
- Correct the y-axis in Figure 1 from “error” to “accuracy” to use consistent axis names;
- In section3.1, add the related reference for the statement “Current works usually study the robust overfitting phenomenon considering the network as a single unit”.
- Add an explanation for Equation (4) that scaling learning rate and scaling gradient are equivalent under the SGD optimization method;
- Add description for $\ell_i$;
- Correct the use of $j$ as layer index to eliminate misunderstanding;
- Add explanation for the constraints of weight perturbation $v_j$;
- Correct the typology of the caption in Figure 4.
Code: https://github.com/ChaojianYu/Layer-Wise-RO
Assigned Action Editor: ~Kui_Jia1
Submission Number: 2524
Loading