A Study of the Effects of Transfer Learning on Adversarial Robustness

Published: 31 May 2024, Last Modified: 31 May 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: The security and robustness of AI systems are paramount in real-world applications. Previous research has focused on developing methods to train robust networks, assuming the availability of sufficient labeled training data. However, in deployment scenarios with limited training data, existing techniques for training robust networks become impractical. In such low-data scenarios, non-robust training methods often resort to transfer learning. This involves pre-training a network on a large, possibly labeled dataset and fine-tuning it for a new task with a limited set of training samples. The efficacy of transfer learning in enhancing adversarial robustness is not comprehensively explored. Specifically, it remains uncertain whether transfer learning can improve adversarial performance in low-data scenarios. Furthermore, the potential benefits of transfer learning for certified robustness are unexplored. In this paper, we conduct an extensive analysis of the impact of transfer learning on both empirical and certified adversarial robustness. Employing supervised and self-supervised pre-training methods and fine-tuning across 12 downstream tasks representing diverse data availability scenarios, we identify the conditions conducive to training adversarially robust models through transfer learning. Our study reveals that the effectiveness of transfer learning in improving adversarial robustness is attributed to an increase in standard accuracy and not the direct ``transfer'' of robustness from the source to the target task, contrary to previous beliefs. Our findings provide valuable insights for practitioners aiming to deploy robust ML models in their applications.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: ### Summary of Changes **1. Limitations Section** As instructed by the AE, we have added Section 6 to address the limitations of the work presented in the paper. Specifically, we have addressed the following limitations: - lack of results using tansformer-based models - lack of results using non-constrastive self-supervised learning methods - lack of empirical adversarial robustness results using the more popular l_inf threat model **2. Square Attack Results** As discussed in the rebuttal, in order to improve the thoroughness of our empirical robustness evaluations we have included Square attack results (a black-box attack). These results have been included in Tables 4 and 6, and the associated discussions have been added to the respective sections. **3. Other Minor Changes** The following changes were made in accordance with the rebuttal response: - properly detailed attack hyperparameters used during evaluation (updated 'evaluation' in Sec 4.1) - added discussion regarding the work by Yamada et al., using it to corroborate our findings (2nd para to Sec 5.1) - added missing citations as mentioned by the reviewers - changed Tables 4, 5, and 6 to highlight relative improvements (to enhance readability)
Video: https://drive.google.com/drive/folders/1DLtUMGWvFUY12YxKAjMmYHOx6BRtWQql?usp=share_link
Code: https://github.com/Ethos-lab/transfer_learning_for_adversarial_robustness
Assigned Action Editor: ~Furong_Huang1
Submission Number: 2085