SkillFlow: Scalable and Efficient Agent Skill Retrieval System

Fangzhou Li; Pagkratios Tagkopoulos; Ilias Tagkopoulos

SkillFlow: Scalable and Efficient Agent Skill Retrieval System

Fangzhou Li, Pagkratios Tagkopoulos, Ilias Tagkopoulos

Published: 15 May 2026, Last Modified: 23 May 2026AgentSkills 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: skill retrieval, multi-stage ranking, agent skills, skill-augmented agents

TL;DR: SkillFlow is a multi-stage retrieval pipeline that, given a task description, efficiently discovers and retrieves the most relevant agent skills from a library of 36K community-contributed skill definitions.

Abstract: AI agents can extend their capabilities at inference time by loading reusable skills into context, yet equipping an agent with too many skills---particularly irrelevant ones---degrades performance. As community-driven skill repositories grow, agents need a way to selectively retrieve only the most relevant skills from a large library. We present \textit{SkillFlow}, the first retrieval pipeline for \emph{Agent Skills}---Anthropic's open format that packages reusable procedural knowledge as self-contained SKILL.md bundles---framing skill discovery as an information retrieval problem over a corpus of ~36K community-contributed skill definitions indexed from GitHub. The pipeline progressively narrows a large candidate set through four stages---dense retrieval, two rounds of cross-encoder reranking, and LLM-based selection---balancing recall and precision at each stage. We evaluate SkillFlow on two coding benchmarks: SkillsBench, a benchmark of 87 tasks and 229 matched skills; and Terminal-Bench, a benchmark that provides only 89 tasks, and no matched skills. On SkillsBench, SkillFlow-retrieved skills raise Pass@1 from 9.2\% to 16.4\% (+78.3\%, $p_{\text{adj}} = 3.64 \times 10^{-2}$), reaching 84.1\% of the oracle ceiling, while on Terminal-Bench, agents readily use the retrieved skills (70.1\% use rate) yet show no performance gain, revealing that retrieval alone is insufficient when the corpus lacks high-quality, executable skills for the target domain. SkillFlow demonstrates that framing Agent Skill discovery as an information retrieval task is an effective strategy, and that the practical impact of skill-augmented agents hinges on corpus coverage and skill quality---particularly the density of runnable code and bundled artifacts.

Presentation Mode: Yes, at least one author will attend and present in person.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 19

Loading