Defense Against Textual Backdoor Attacks with Token SubstitutionDownload PDF

Anonymous

16 Oct 2022 (modified: 05 May 2023)ACL ARR 2022 October Blind SubmissionReaders: Everyone
Keywords: backdoor attack, textual backdoor attack defense, syntactic backdoor attack
Abstract: Backdoor attack is a type of malicious threat to deep neural networks. The attacker embeds a backdoor into the model during the training process by poisoning the data with triggers. The victim model behaves normally on clean data, but predicts inputs with triggers as the trigger-associated class. Backdoor attacks have been investigated in both computer vision and natural language processing (NLP) fields. However, the study of defense methods against textual backdoor attacks in NLP is insufficient. To our best knowledge, there is no method available to defend against syntactic backdoor attacks. In this paper, we propose a novel defense method against textual backdoor attacks, including syntactic backdoor attacks. Experiments show the effectiveness of our method against two state-of-the-art textual backdoor attacks on three benchmark datasets. We will release the code once the paper is published.
Paper Type: long
Research Area: NLP Applications
0 Replies

Loading