Do Agent Skills Speak Safety in Every Language? A Cross-Lingual Security Analysis of the Skills Ecosystem

Haiyue Zhang; Aojie Yuan; Yi Nian; Yue Zhao

Do Agent Skills Speak Safety in Every Language? A Cross-Lingual Security Analysis of the Skills Ecosystem

Haiyue Zhang, Aojie Yuan, Yi Nian, Yue Zhao

Published: 15 May 2026, Last Modified: 20 May 2026AgentSkills 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: agent skills, cross-lingual safety, supply chain security, static analysis, LLM agents, multilingual bias

TL;DR: We scan 3,656 Skills across 8 languages: no cross-lingual gap in code-level security, but behavioral heuristics produce 3-5x more false positives on non-English Skills, revealing multilingual calibration bias in Skills security tooling.

Abstract: Agent Skills are structured SKILL.md packages that augment LLM agents at inference time. They have been adopted across Claude Code, Gemini CLI, OpenClaw, Codex, and other agent platforms. The ToxicSkills audit found that 37% of 3,984 Skills contain security flaws, but no study has examined whether this rate is uniform across languages. We present the first cross-lingual security analysis of the Skills ecosystem, scanning 3,656 real-world Skills across 8 languages with a Skills-native scanner (Cisco Skill Scanner v2.0.9) and a general-purpose baseline (Bandit 1.8.0). We find that code-level security findings (command injection, hardcoded secrets, prompt injection) affect 2.3% of Skills with no statistically significant cross-lingual difference. However, behavioral flags—particularly social-engineering indicators such as vague or misleading descriptions—are 3–5× higher for non-English Skills ($p < 0.001$, Bonferroni-corrected), with Japanese Skills reaching 31.7% compared to 6.1% for English. Manual precision assessment reveals that these behavioral heuristics have low precision overall (EN: 10%, JA: 0% on 20-finding samples), but the false-positive rate is particularly acute for non-English content. We argue that this gap reflects scanner calibration bias and call for multilingual evaluation of Skills security tooling. We release the dataset and analysis scripts to support reproducible research.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 51

Loading