Keywords: adversarial training, adversarial robustness, relationship knowledge distillation
Abstract: In this study, we revisit the representation learning problem for adversarial training from the perspective of relation preservation. Typical adversarial training methods tend to pull clean and adversarial samples closer to improve robustness. However, our experimental analysis reveals that such operation would lead to cluttered feature representations thus decreasing the accuracy for both clean and adversarial samples. To alleviate the problem, we build a robust discriminative feature space for both clean and adversarial samples by taking into account a relational prior which preserves the relationship between features of clean samples. A flexible relationship preserving adversarial training (FRPAT) strategy is proposed to transfer the well-generalized relational structure of the standard training model into the adversarial training model. Moreover, it acts as an extra regularization term mathematically, making it easy to be combined with various popular adversarial training algorithms in a plug-and-play way to achieve the best of both worlds. Extensive experiments on CIFAR10 and CIFAR100 demonstrate the superiority of our algorithm. Without additional data, it improves clean generalizability up to $\textbf{8.78\%}$ and robust generalizability up to $\textbf{3.04\%}$ on these datasets.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)
Supplementary Material: zip
8 Replies
Loading