Pre-trained Transformers as Plug-in Defenders Against Adversarial Perturbations

Kai Wu; Yujian Betterest Li; Jian Lou; Xiaoyu Zhang; Handing Wang; Jing Liu

Pre-trained Transformers as Plug-in Defenders Against Adversarial Perturbations

Kai Wu, Yujian Betterest Li, Jian Lou, Xiaoyu Zhang, Handing Wang, Jing Liu

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Primary Area: societal considerations including fairness, safety, privacy

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: pre-trained transformers, fine-tuning, adversarial defense

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: Consider pre-trained transformers as plug-in defenders for deployed models to rapid respond to adversarial perturbations.

Abstract: With more and more deep neural networks being deployed as various daily ser- vices, their reliability is essential. It is frightening that deep neural networks are vulnerable and sensitive to adversarial attacks, the most common one of which for the services is evasion-based. Recent works usually strengthen the robustness by adversarial training or leveraging the knowledge of an amount of clean data. However, retraining and redeploying the model need a large computational budget, leading to heavy losses to the online service. In addition, when training, it is likely that only limited adversarial examples are available for the service provider, while much clean data may not be accessible. Based on the analysis on the defense for deployed models, we find that how to rapidly defend against a certain attack for a frozen original service model with limitations of few clean and adversarial examples, which is named as RaPiD (Rapid Plug-in Defender), is really important. Motivated by the generalization and the universal computation ability of pre-trained transformer models, we come up with a new defender method, CeTaD, which stands for Considering Pre-trained Transformers as Defenders. In particular, we evaluate the effectiveness and the transferability of CeTaD in the case of one-shot adversarial examples and explore the impact of different parts of CeTaD as well as training data conditions. CeTaD is flexible for different differentiable service models, and suitable for various types of attacks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Supplementary Material: pdf

Submission Number: 4257

Loading