Keywords: skill retrieval, multi-stage ranking, agent skills, skill-augmented agents
TL;DR: SkillFlow is a multi-stage retrieval pipeline that, given a task description, efficiently discovers and retrieves the most relevant agent skills from a library of 36K community-contributed skill definitions.
Abstract: AI agents can extend their capabilities at inference time by loading reusable skills into context, yet equipping an agent with too many skills---particularly irrelevant ones---degrades performance.
As community-driven skill repositories grow, agents need a way to selectively retrieve only the most relevant skills from a large library.
We present \textit{SkillFlow}, the first retrieval pipeline for \emph{Agent Skills}---Anthropic's open format that packages reusable procedural knowledge as self-contained SKILL.md bundles---framing skill discovery as an information retrieval problem over a corpus of ~36K community-contributed skill definitions indexed from GitHub.
The pipeline progressively narrows a large candidate set through four stages---dense retrieval, two rounds of cross-encoder reranking, and LLM-based selection---balancing recall and precision at each stage.
We evaluate SkillFlow on two coding benchmarks: SkillsBench, a benchmark of 87 tasks and 229 matched skills; and Terminal-Bench, a benchmark that provides only 89 tasks, and no matched skills. On SkillsBench, SkillFlow-retrieved skills raise Pass@1 from 9.2\% to 16.4\% (+78.3\%, $p_{\text{adj}} = 3.64 \times 10^{-2}$), reaching 84.1\% of the oracle ceiling, while on Terminal-Bench, agents readily use the retrieved skills (70.1\% use rate) yet show no performance gain, revealing that retrieval alone is insufficient when the corpus lacks high-quality, executable skills for the target domain.
SkillFlow demonstrates that framing Agent Skill discovery as an information retrieval task is an effective strategy, and that the practical impact of skill-augmented agents hinges on corpus coverage and skill quality---particularly the density of runnable code and bundled artifacts.
Presentation Mode: Yes, at least one author will attend and present in person.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 19
Loading