GradMask: Gradient-Guided Token Masking for Textual Adversarial Example DetectionDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: We present a simple model-agnostic textual adversarial example detection scheme called GradMask. It uses gradient signals to detect adversarially perturbed tokens in an input sequence and occludes such tokens by a masking process. GradMask provides several advantages over existing methods including improved detection performance and a weak interpretation of its decision. Extensive evaluations on widely adopted natural language processing benchmark datasets demonstrate the efficiency and effectiveness of GradMask
0 Replies

Loading