Keywords: Text Adversarial Attacks, Trustworthy artificial intelligence
Abstract: Textual adversarial attacks confuse Natural Language Processing (NLP) models,
such as Large Language Models (LLMs), by finely modifying the text, resulting
in incorrect decisions. Although existing adversarial attacks are effective, they
typically rely on knowing the victim model, using extensive queries, or grasping
training data, which limits their real-world applications. In situations where there
is neither knowledge of nor access to the victim model, we introduce the Free
Lunch Adversarial Attack (FLA), demonstrating that attackers can successfully
execute attacks armed only with victim texts. To prevent access to the victim
model, we create a shadow dataset with publicly available pre-trained models and
clustering methods as a foundation for developing substitute models. To address
the low attack success rate (ASR) due to insufficient information feedback, we
propose the hierarchical substitution model design, generating substitute models
that approximate the victim’s decision boundaries to enhance ASR. Concurrently,
we use diverse adversarial example generation, employing various attack methods
to reduce the frequency of model training, balancing effectiveness with efficiency.
Experiments with the Emotion and SST5 datasets show that the FLA outperforms
existing state-of-the-art methods while lowering the attack cost to
zero. More importantly, we discover that FLA poses a significant threat to LLMs
such as Qwen2 and the GPT family, and achieves the highest ASR of 45.99% even
without access to the API, confirming that advanced NLP models still face serious
security risks.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3864
Loading