Keywords: Large Language Model, Prompt Injection Attack, LLM Agent
TL;DR: We introduce an automatic black-box indirect prompt injection attack against LLMs and LLM agents.
Abstract: Although large Language Models (LLMs) and LLM agents have been widely adopted, they are vulnerable to indirect prompt injection attacks, where malicious external data is injected to manipulate model behaviors. Existing evaluations of LLM robustness against such attacks are limited by handcrafted methods and reliance on white-box or gray-box access—conditions unrealistic in practical deployments. To bridge this gap, we propose AutoHijacker, an automatic indirect black-box prompt injection attack. Built on the concept of LLM-as-optimizers, AutoHijacker introduces a batch-based optimization framework to handle sparse feedback and also leverages a trainable memory to enable effective generation of indirect prompt injections without continuous querying. Evaluations on two public benchmarks, AgentDojo and Open-Prompt-Injection, show that AutoHijacker outperforms 11 baseline attacks and achieves state-of-the-art performance without requiring external knowledge like user instructions or model configurations, and also demonstrates higher average attack success rates against 8 various defenses. Additionally, AutoHijacker successfully attacks a commercial LLM agent platform, achieving a 71.9% attack success rate in both document interaction and website browsing tasks.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11629
Loading