Keywords: agent skills, agentic AI security, LLM agents, prompt injection, tool poisoning, metadata poisoning, software supply chain, static analysis
TL;DR: SkillSpector is a pre-publication scanner that reviews agent-skill instructions, metadata, permissions, dependencies, and helper code together to make security risk visible before publication or installation.
Abstract: Agent skills package procedural knowledge for large language model (LLM) agents in a form that can be discovered and loaded at inference time. They let agents acquire domain-specific workflows without model retraining, but they also create a new software supply-chain surface: a skill may combine natural-language instructions, activation metadata, permission declarations, dependencies, and executable helper code. Existing scanners can inspect code or dependencies, but they rarely reason across model-visible metadata, declared permissions, natural-language instructions, and bundled implementation.
We present SkillSpector, a pre-publication security scanner for agent skills. SkillSpector normalizes a skill bundle into a shared state and runs static pattern detectors, abstract syntax tree (AST) and taint analyzers, manifest consistency checks, metadata poisoning checks, supply-chain checks, optional LLM-based semantic analyzers, and report generation. Its taxonomy is grounded in Open Worldwide Application Security Project (OWASP) guidance for LLM and agentic artificial intelligence (AI) applications, MITRE Adversarial Threat Landscape for Artificial-Intelligence Systems (ATLAS) adversarial categories, and Open Source Vulnerabilities (OSV.dev) dependency vulnerability data. The implementation exposes terminal, JavaScript Object Notation (JSON), Markdown, and Static Analysis Results Interchange Format (SARIF) output; a 0-100 risk score; and 64 rule patterns across 16 categories. We position deterministic analyzers as the enforcement core and LLM-backed analyzers as advisory signals for human review. Functional validation exercises distinct skill risk surfaces, an unlabeled 178-skill field exercise checks operability on real skill layouts, and a 1,058-trace internal CI/CD analysis characterizes finding distribution and reviewer workload on a benign-skewed skill catalog. We do not claim precision, recall, or ecosystem-scale accuracy without a labeled corpus. The goal is not to prove skill safety, but to make skill risk visible, reviewable, and actionable before publication or installation. The main contribution is a practical control point that treats instructions, metadata, permissions, dependencies, and code as one review target.
Presentation Mode: Yes, at least one author will attend and present in person.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 72
Loading