ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Self-supervised Adversarial Training, Adversarial Training, Adversarial Robustness, Contrastive Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: ProFeAT bridges the performance gap between self-supervised and supervised adversarial training methods by introducing a projection head alongside appropriate training losses and augmentations in a distillation framework.
Abstract: Supervised adversarial training has been the most successful approach for improving the robustness of Deep Neural Networks against adversarial attacks. While several recent works have attempted to overcome the need for supervision or labeled training data by integrating adversarial training with contrastive Self-Supervised Learning (SSL) approaches such as SimCLR, their performance has been sub-optimal due to the increased training complexity. A recent approach mitigates this by utilizing supervision from a standard self-supervised trained model in a teacher-student setting that mimics supervised adversarial training. However, we find that there is still a large gap in performance when compared to supervised training, specifically on larger capacity models. We show that this is a result of mismatch in training objectives of the teacher and student, and propose Projected Feature Adversarial Training (ProFeAT) to bridge this gap by using a projection head in the adversarial training step. We further propose appropriate attack and defense losses at the feature and projector spaces, coupled with a combination of weak and strong augmentations for the teacher and student respectively, to improve generalization without increasing the training complexity. We demonstrate significant improvements in performance when compared to existing SSL methods, and performance on par with TRADES, a popular supervised adversarial training method, on several benchmark datasets and models.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9090
Loading