Exploring the Universal Vulnerability of Prompt-based Learning ParadigmDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Prompt-based learning paradigm bridges the gap between pre-training and fine-tuning, and works effectively under the few-shot setting. However, we find that this learning paradigm inherits the vulnerability from the pre-training stage, where model predictions can be misled by inserting certain triggers into the text. In this paper, we explore this universal vulnerability by either injecting \textit{backdoor triggers} or searching for \textit{adversarial triggers} on pre-trained language models using only plain text. In both scenarios, we demonstrate that our triggers can totally control or severely decrease the performance of prompt-based models fine-tuned on arbitrary downstream tasks, reflecting the universal vulnerability of the prompt-based learning paradigm. Further experiments show that adversarial triggers have good transferability among language models. We also find conventional fine-tuning models are not vulnerable to adversarial triggers constructed from pre-trained language models. We conclude by proposing a potential solution to mitigate our attack methods. All the code and data will be made public.
Paper Type: long
0 Replies

Loading