Keywords: AI Agent Security, Benchmarking, Prompt Injection, Tool-Using Agents, Static Analysis, Dynamic Evaluation, Malicious Skill Detection
TL;DR: This paper studies malicious AI-agent skills that covertly exfiltrate credentials through ASCII smuggling, encoded payloads, steganography, and RSA-based leakage, and evaluates defenses against them.
Abstract: Agent Skills---structured packages of instructions and scripts that augment LLM-based agents---are rapidly proliferating, yet their security properties remain under-explored.
We present SkillsMetric, a five-stage static analysis framework that scores skill packages along pattern density, statistical anomaly, dataflow taint, import anomaly, and capability mismatch dimensions.
We construct an adversarial evaluation dataset of 2,266 skills spanning 16~attack types across code-level, system-level, and semantic-level threats, and evaluate on the full SkillMD-138K corpus.
Our framework achieves an AUC of 0.93 and 5-fold cross-validated F1 of 73.4\%$\pm$0.5\%, with strong detection of data exfiltration (93\%) and steganographic payloads (93\%).
Crucially, we identify fundamental blind spots: host destruction attacks using common shell commands evade all five stages (0\% detection), and prompt injection via natural-language manipulation achieves only 42\% detection.
These findings establish that static analysis alone is insufficient for skill security, motivating defense-in-depth architectures that combine fast static pre-screening with semantic review.
Presentation Mode: Undecided at this time.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 73
Loading