Abstract: The lack of robustness is a serious problem for deep neural networks (DNNs) and makes DNNs vulnerable to adversarial examples. A promising solution is applying adversarial training to alleviate this problem, which allows the model to learn the features from adversarial examples. However, adversarial training usually produces overfitted models and may not work when facing a new attack. We believe this is because the previous adversarial training using cross-entropy loss ignores the similarity between the adversarial examples and the original examples, which will result in a low margin. Accordingly, we propose a supervised adversarial contrastive learning (SACL) approach for adversarial training. SACL uses supervised adversarial contrastive loss which contains both the cross-entropy term and adversarial contrastive term. The cross-entropy term is used for guiding DNN inductive bias learning, and the adversarial contrastive term can help models learn example representations by maximizing feature consistency under different original examples, which fits well with the goal of solving low margins. In addition, SACL only uses adversarial examples which can successfully fool the model and their corresponding original examples for training. This process is more advantageous to provide the model with more accurate information about the decision boundary and obtain a model that fits the example distribution. Experiments show that SACL can reduce the attack success rate of multiple adversarial attack algorithms against different models on text classification tasks. The defensive performance is significantly better than other adversarial training approaches without reducing the generalization ability of the model. In addition, the DNN model trained by our approach has high transferability and robustness.
External IDs:dblp:journals/nca/LiZASJY23
Loading