ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations

Published: 28 Jun 2024, Last Modified: 25 Jul 2024NextGenAISafety 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Self-supervised Adversarial Training, Adversarial Training, Adversarial Robustness, Contrastive Learning
TL;DR: ProFeAT bridges the performance gap between self-supervised and supervised adversarial training methods by introducing a projection head alongside appropriate training losses and augmentations in a distillation framework.
Abstract: The need for abundant labelled data for supervised Adversarial Training (AT) has prompted the use of Self-Supervised Learning (SSL) techniques with AT. The direct application of existing SSL methods to adversarial training has been sub-optimal due to the increased training complexity of combining SSL with AT. A recent approach DeACL mitigates this by utilizing supervision from a standard SSL teacher in a distillation setting, to mimic supervised AT. However, we find that there is still a large performance gap when compared to supervised adversarial training, specifically on larger model capacities. We show that this is a result of mismatch in training objectives of the teacher and student, and propose Projected Feature Adversarial Training (ProFeAT) to bridge this gap. We utilize a projection head in the adversarial training step with appropriate attack and defense losses at the feature and projector, coupled with a combination of weak and strong augmentations for the teacher and student respectively, to improve both clean and robust generalization. Through extensive experiments on several benchmark datasets and models, we demonstrate significant improvements in performance when compared to existing SSL-AT methods, setting a new state-of-the-art. We further report on-par/ improved performance when compared to TRADES, a popular supervised-AT method.
Submission Number: 119
Loading