GradMask: Gradient-Guided Token Masking for Textual Adversarial Example DetectionDownload PDF

Anonymous

17 Sept 2021 (modified: 05 May 2023)ACL ARR 2021 September Blind SubmissionReaders: Everyone
Abstract: We present a simple model-agnostic textual adversarial example detection scheme called GradMask. It uses gradient signals to detect adversarially perturbed tokens in an input sequence and occludes such tokens by a masking process. GradGask provides several advantages over existing methods including lower computational cost, improved detection performance, and a weak interpretation of its decision. Extensive evaluations on widely adopted natural language processing benchmark datasets demonstrate the efficiency and effectiveness of Gradmask.
0 Replies

Loading