Consistency Regularization Helps Mitigate Robust Overfitting in Adversarial Training

Shudong Zhang, Haichang Gao, Yunyi Zhou, Zihui Wu, Yiwen Tang

2022 (modified: 12 Nov 2022)KSEM (3) 2022Readers: Everyone

Abstract: Adversarial training (AT) has been shown to be one of the most effective ways to protect deep neural networks (DNNs) from adversarial attacks . However, the phenomenon of robust overfitting, that is, the robustness will drop sharply at a certain stage, always exists in the AT process. In order to obtain a robust model, it is important to reduce this robust generalization gap. In this paper, we delve into robust overfitting from a new perspective. We observe that consistency regularization, a popular technique in semi-supervised learning, has similar goals to AT and can help mitigate robust overfitting. We empirically verify this observation and find that most previous solutions are implicitly linked to consistency regularization. Inspired by this, we introduce a new AT solution that integrates consistency regularization and mean teacher (MT) strategy into AT. Specifically, we introduce a teacher model derived from the average weights of the student models in the training step. We then design a consistency loss function to make the predicted distribution of the student model on adversarial samples consistent with the predicted distribution of the teacher model on clean samples. Experiments show that our proposed method can effectively mitigate robust overfitting and improve the robustness of DNN models against common adversarial attacks.

0 Replies