Natural Language Adversarial Attack and Defense in Word Level

Xiaosen Wang; Hao Jin; Kun He

Natural Language Adversarial Attack and Defense in Word Level

Xiaosen Wang, Hao Jin, Kun He

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Withdrawn SubmissionReaders: Everyone

TL;DR: The first text adversarial defense method in word level, and the improved generic based attack method against synonyms substitution based attacks.

Abstract: Up until very recently, inspired by a mass of researches on adversarial examples for computer vision, there has been a growing interest in designing adversarial attacks for Natural Language Processing (NLP) tasks, followed by very few works of adversarial defenses for NLP. To our knowledge, there exists no defense method against the successful synonym substitution based attacks that aim to satisfy all the lexical, grammatical, semantic constraints and thus are hard to perceived by humans. We contribute to fill this gap and propose a novel adversarial defense method called Synonym Encoding Method (SEM), which inserts an encoder before the input layer of the model and then trains the model to eliminate adversarial perturbations. Extensive experiments demonstrate that SEM can efficiently defend current best synonym substitution based adversarial attacks with little decay on the accuracy for benign examples. To better evaluate SEM, we also design a strong attack method called Improved Genetic Algorithm (IGA) that adopts the genetic metaheuristic for synonym substitution based attacks. Compared with existing genetic based adversarial attack, IGA can achieve higher attack success rate while maintaining the transferability of the adversarial examples.

Keywords: adversarial examples, text adversarial defense, text adversarial attack, synonym encoding, natural language processing

Original Pdf: pdf

8 Replies

Loading