Keywords: prompt injection, LLM agents, tool-augmented LLMs, adversarial robustness, adaptive attacks, AI safety, model evaluation, AgentDojo, attack surfaces, security
Abstract: Tool-augmented LLM agents are vulnerable to prompt injection: a third party who controls part of the agent's context can plant instructions that the agent then executes as if they came from the user. Current evaluations report a single attack success rate per model on one channel, the tool output and treat that number as the model's vulnerability. But tool descriptions, which the agent reads at every turn before any tool is called, are themselves an injection surface that the attacker can choose instead. We hold the injection payload byte-identical and deliver it through both surfaces across 13 LLMs from six families and four task suites. The same bytes invert in success rate across models: \textsc{GPT-4.1} is 96\% vulnerable on tool outputs but only 4\% on tool descriptions, while \textsc{Gemini-3-Flash} shows the mirror pattern at 20\% and 98\%. A variance decomposition over 6{,}830 attempts attributes $0\%$ of the variation in attack outcomes to the surface alone, while the model$\times$surface interaction accounts for $16.7\%$. Vulnerability is a property of the pairing, not the channel. The Adaptive Attack Rate, defined as the per-cell maximum over surfaces, exceeds the strongest fixed-surface baseline by $+9.1$ percentage points on average. Standard prompt-level defenses inherit the same blindspot, reducing tool-output ASR to 10--18\% while leaving the description channel above 54\%. Both attack and defense evaluation must report per-surface vulnerability.
Paper Type: Long
Research Area: LLM agents
Research Area Keywords: Ethics, Bias, and Fairness; Resources and Evaluation; Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data analysis
Languages Studied: Eng
EMNLP 2026 AI Reviewing Experiment: yes
Submission Number: 17266
Loading