Defend Against Textual Backdoor Attacks By Token SubstitutionDownload PDF

Published: 18 Nov 2022, Last Modified: 05 May 2023RobustSeq @ NeurIPS 2022 PosterReaders: Everyone
Keywords: backdoor attack, natrual language processing
TL;DR: We proposed an algortihm that can defend against syntactic backdoor attack in NLP
Abstract: Backdoor attacks are a type of malicious threat to deep neural networks (DNNs). The attacker injects a trigger into the model during the training process. The victim model behaves normally on data without the backdoor attack trigger but gives a prediction the same as the attacker-specified target. Backdoor attacks were first investigated in computer vision. The investigation of backdoor attacks has also emerged in natural language processing (NLP) recently. However, the study of defense methods against textual backdoor attacks is still insufficient. Especially, there are not enough methods available to protect against backdoor attacks using syntax as the trigger. In this paper, we propose a novel method that can effectively defend against syntactic backdoor attacks. Experiments show the effectiveness of our method on BERT for syntactic backdoor attacks when choosing five different syntaxes as triggers.
0 Replies

Loading