SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement
Keywords: Agent Skills, Prompt Injection, Coding Agents, Multimodal Attack
Abstract: Agent skills are becoming a core abstraction in coding agents, combining long-form instructions and auxiliary scripts to extend tool-augmented behaviors. This abstraction also introduces an underexplored attack surface: skill-based prompt injection, where poisoned skills can steer agents away from user intent and safety policies. Existing attacks are largely hand-crafted, while naive injections often fail because the malicious intent is too explicit or deviates too much from the original skill. We propose \textsc{SkillJect}, the first automated framework for stealthy prompt injection against agent skills. \textsc{SkillJect} forms a closed loop with three agents: an Attack Agent that synthesizes stealthy injection skills, a Code Agent that executes tasks in a realistic tool environment, and an Evaluate Agent that analyzes action traces and verifies whether targeted malicious behaviors are triggered. We further introduce a payload-hiding strategy that conceals adversarial operations in auxiliary scripts while using optimized inducement prompts to trigger tool execution. Experiments across diverse coding-agent settings and real-world software engineering tasks show that \textsc{SkillJect} achieves consistently high attack success rates under realistic conditions. We also study a multimodal variant, \textsc{SkillJect-Image}, which hides key malicious instructions in visual assets referenced by the skill documentation instead of exposing them in text. This cross-modal design further strengthens the attack, suggesting that visual instruction channels can evade text-centric safety filters in modern coding agents.
Supplementary Material: pdf
Submission Number: 18
Loading