GradMask: Gradient-Guided Token Masking for Textual Adversarial Example Detection

Anonymous

GradMask: Gradient-Guided Token Masking for Textual Adversarial Example Detection

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: We present a simple model-agnostic textual adversarial example detection scheme called GradMask. It uses gradient signals to detect adversarially perturbed tokens in an input sequence and occludes such tokens by a masking process. GradMask provides several advantages over existing methods including improved detection performance and a weak interpretation of its decision. Extensive evaluations on widely adopted natural language processing benchmark datasets demonstrate the efficiency and effectiveness of GradMask

0 Replies

Loading