TROJFSL: TROJAN INSERTION IN FEW SHOT PROMPT LEARNING

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Pre-trained Language Model, Few-Shot, Prompt, Trojan Attack
TL;DR: Our work pioneers the investigation of Trojan attacks on few-shot prompt-learning pre-trained models, highlighting their security vulnerabilities.
Abstract: Prompt-tuning emerges as one of the most effective solutions to adapting a pre-trained language model (PLM) to processing new downstream natural language processing tasks, especially with only few input samples. The success of prompt-tuning motivates adversaries to create backdoor attacks against prompt-tuning. However, prior prompt-based backdoor attacks cannot be implemented through few-shot prompt-tuning, i.e., they require either a full-model fine-tuning or a large training dataset. We find it is difficult to build a prompt-based backdoor via few-shot prompt-tuning, i.e., freezing the PLM and tuning a soft prompt with a limited set of input samples. A backdoor design via few-shot prompt-tuning introduces an imbalanced poisoned dataset, easily suffers from the overfitting issue, and lack attention awareness. To mitigate these issues, we propose TrojFSL to perform backdoor attacks in the setting of few-shot prompt-tuning. TrojFSL consists of three modules, i.e., balanced poison learning, selective token poisoning, and trojan-trigger attention. Compared to prior prompt-based backdoor attacks, TrojFSL improves the ASR by 9% - 48% and the CDA by 4% - 9% across various PLMs and a wide range of downstream tasks.
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4392
Loading