FEAttack: A Fast and Efficient Hard-Label Textual Attack Framework

Published: 01 Jan 2024, Last Modified: 13 May 2025WASA (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Hard-label textual attacks are realistic but challenging because they can only rely on the predicted label information as a guide to generate adversarial examples. Although existing methods can successfully generate adversarial examples for the attack, the complex heuristics methods and inaccurate gradient estimation methods they rely on lead to a large number of queries to the victim model during the example optimization process. However, most model providers usually limit the number of queries to the model for security, so existing attacks are unable to generate high-quality adversarial examples under this limitation. In this paper, we propose FEAttack, a Fast and Effective hard-label textual Attack framework for generating high-quality adversarial examples under the low query budget limitation. First, FEAttack generates multiple initial adversarial examples to ensure quality stability. Then it adopts a two-stage optimization strategy by replacing with original words to reduce the perturbation, and greedily replacing with synonyms to improve the semantic similarity. Extensive experimental results on three typical natural language processing models and real API show that FEAttack can generate high-quality adversarial examples quickly with higher semantic similarity and lower perturbation rate under the low query budget limitation.
Loading