SkillCraft: Can LLM Agents Learn to Use Tools Skillfully?

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: Tool composition, LLM agents, Skill
Abstract: Real-world tool-using language agents operate in long- horizon workflows with recurring substructures (Jimenez et al., 2024; Zhou et al., 2023), where effective behavior requires not just invoking atomic tools but abstracting and reusing higher-level tool compositions. In cognitive science, such repetition gives rise to skill abstraction: intelligence is characterized by efficiently acquiring and recomposing higher-level procedures from experience (Chollet, 2019). This raises a fundamental question: can LLM agents ac- quire and reuse compositional tool skills that generalize across tasks? Existing benchmarks (Zhou et al., 2023; Xu et al., 2024; Li et al., 2025) fix the toolset at deployment and evaluate each task independently. We address this gap with SkillCraft, a benchmark and protocol designed to elicit, measure, and reward reusable tool compositions—what we call Skills.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 115
Loading