Improving Textual Adversarial Attacks using Metric-Guided Rewrite and Rollback

Anonymous

Improving Textual Adversarial Attacks using Metric-Guided Rewrite and Rollback

Anonymous

05 Jun 2022 (modified: 05 May 2023)ACL ARR 2022 June Blind SubmissionReaders: Everyone

Keywords: classification, adversarial attack, robustness

Abstract: Adversarial examples are helpful for analyzing and improving the robustness of the classifier. Generating high-quality adversarial examples is a challenging task as it requires the generation of adversarial sentences that are fluent, semantically similar to the original ones and should lead to misclassification. Existing methods prioritize misclassification by maximizing each perturbation's effectiveness at misleading the classifier; thus, the generated adversarial examples fall short in terms of fluency and similarity. In this paper, we define a critique score that synthesizes the fluency, similarity, and misclassification metrics. We propose a rewrite and rollback (R&R) framework guided by the optimization of this score to improve the adversarial attack. R&R generates high-quality adversarial examples by allowing exploration of perturbations without immediate impact on the misclassification, and yet optimizing critique score for better fluency and similarity. We evaluate our method on 5 representative datasets and 3 classifier architectures. Our method outperforms current state-of-the-art in attack success rate by +16.2%, +12.8%, and +14.0% on the classifiers respectively. All code and results will be publicly available.

Paper Type: long

0 Replies

Loading