Abstract: Adversarial examples will induce errors in the target model under the condition that humans can hardly observe the changes between them and the original examples. But low-quality adversarial examples are easily identified and alerted to by humans, affecting the effectiveness of the attack. The comprehensive quality of the adversarial example is measured not only in terms of the degree of explicit perturbation and semantic similarity but also in terms of implicit textual grammatical errors and textual fluency. Existing hard-label attacks often generate candidates without paying attention to context, as a result, the generated adversarial examples are not semantic and syntactic, and the quality is poor. In this paper, we propose QAE, a hard-label attack based on a pre-trained masked language model and optimal example selection rules, to comprehensively improve the Quality of the Adversarial Examples. QAE uses a pre-trained masked language model to generate candidates that better match the semantic and syntactic rules of the context and constructs an adversarial example by random substitution. Then, it optimizes the adversarial example using a genetic algorithm that combines optimal example selection rules. Extensive experiments and human evaluation show that QAE can generate high-quality adversarial examples with better semantic fluency and fewer grammar errors while maintaining a similar attack success rate to the existing hard-label attacks.
Loading