SkillCraft: Can LLM Agents Learn to Use Tools Skillfully?

Shiqi Chen; Jingze Gai; Ruochen Zhou; Jinghan Zhang; Tongyao Zhu; Junlong Li; Kangrui Wang; Zihan Wang; Zhengyu Chen; Klara Kaleb; Ning Miao; Siyang Gao; Cong Lu; Manling Li; Junxian He; Yee Whye Teh

SkillCraft: Can LLM Agents Learn to Use Tools Skillfully?

Shiqi Chen, Jingze Gai, Ruochen Zhou, Jinghan Zhang, Tongyao Zhu, Junlong Li, Kangrui Wang, Zihan Wang, Zhengyu Chen, Klara Kaleb, Ning Miao, Siyang Gao, Cong Lu, Manling Li, Junxian He, Yee Whye Teh

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: Tool composition, LLM agents, Skill

Abstract: Real-world tool-using language agents operate in long- horizon workflows with recurring substructures (Jimenez et al., 2024; Zhou et al., 2023), where effective behavior requires not just invoking atomic tools but abstracting and reusing higher-level tool compositions. In cognitive science, such repetition gives rise to skill abstraction: intelligence is characterized by efficiently acquiring and recomposing higher-level procedures from experience (Chollet, 2019). This raises a fundamental question: can LLM agents ac- quire and reuse compositional tool skills that generalize across tasks? Existing benchmarks (Zhou et al., 2023; Xu et al., 2024; Li et al., 2025) fix the toolset at deployment and evaluate each task independently. We address this gap with SkillCraft, a benchmark and protocol designed to elicit, measure, and reward reusable tool compositions—what we call Skills.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 115

Loading