Defense Against Textual Backdoor Attacks with Token Substitution

Anonymous

Defense Against Textual Backdoor Attacks with Token Substitution

Anonymous

17 Feb 2023 (modified: 05 May 2023)ACL ARR 2023 February Blind SubmissionReaders: Everyone

Abstract: Backdoor attack is a type of malicious threat to deep neural networks. The attacker embeds a backdoor into the model during the training process by poisoning the data with triggers. The victim model behaves normally on clean data, but predicts inputs with triggers as the trigger-associated class. Backdoor attacks have been investigated in both computer vision and natural language processing (NLP) fields. However, the study of defense methods against textual backdoor attacks in NLP is insufficient. To our best knowledge, there is no method available to defend against syntactic backdoor attacks. In this paper, we propose a novel defense method against textual backdoor attacks, including syntactic backdoor attacks. Experiments show the effectiveness of our method against both insertion-based and syntactic backdoor attacks on three benchmark datasets. We will release the code once the paper is published.

Paper Type: long

Research Area: NLP Applications

0 Replies

Loading