Abstract: Small language models (SLMs) are increasingly deployed for tool calling on edge devices and
in agentic systems, yet their safety under adversarial conditions remains unstudied. Unlike
text generation, tool calling creates a unique attack surface: a single malicious tool call can
trigger irreversible real-world actions such as unauthorized financial transfers, data exfiltra-
tion, or privilege escalation. We present ToolGuard, to our knowledge the first systematic
study of adversarial robustness in SLM tool calling. We contribute: (1) a taxonomy of five
attack categories targeting tool-calling SLMs—parameter injection, tool substitution, privi-
lege escalation, data exfiltration, and chain attacks; (2) ToolAttackBench, a benchmark
of 50 adversarial prompts across 17 tool schemas in 5 domains; (3) an empirical red-team
evaluation of four SLM families (1B–3B parameters) over 10 independent runs per prompt,
revealing that capable SLMs exhibit attack success rates (ASR) of 47–52%, with a capabil-
ity floor for tool-call vulnerability near 1–1.7B parameters; and (4) a runtime defense that
enforces declarative security policies on completed tool calls, reducing mean ASR by 76%
(from 48.9% to 11.7%) on the full benchmark and 78% (from 49.3% to 10.9%) on a held-out
test set (Section 7.2), with 0% false positive rate on simulated canonical benign outputs
(n=41; per-model FPR on actual model outputs not measured; 95% Wilson CI: [0%, 8.6%])
and sub-5ms latency overhead. We find that tool substitution is the most dangerous attack
category (65.4% mean ASR) but is effectively neutralized by intent verification (reduced
to 5.0%), while data exfiltration proves the hardest to defend (44.3% → 31.7%) due to its
reliance on semantically valid tool calls. An adaptive evaluation with 8 policy-aware attacks
confirms that defense effectiveness drops against knowledgeable adversaries (19.0% defended
ASR vs. 10.9% on held-out attacks), underscoring the need for learned defense components.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Kamalika_Chaudhuri1
Submission Number: 9195
Loading