TaskCraft: Automated Generation of Agentic Tasks

ACL ARR 2025 July Submission682 Authors

28 Jul 2025 (modified: 23 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Agentic tasks, which require multi-step problem solving with autonomy, tool use, and adaptive reasoning, are becoming increasingly central to the advancement of NLP and AI. However, existing instruction data lacks tool interaction, and current agentic benchmarks rely on costly human annotation, limiting their scalability. We introduce TaskCraft, an automated workflow for generating difficulty-scalable, multi-tool, and verifiable agentic tasks with execution trajectories. TaskCraft expands atomic tasks using depth-based and width-based extensions to create structurally and hierarchically complex challenges. Inspired by bootstrap few-shot learning, a self-evolving prompt optimization is implemented to enhance sampling success and reduce latency. Experimental results from SFT on multiple LLMs demonstrate that TaskCraft data substantially enhances multi-hop reasoning and agentic capabilities. Further scaling with TaskCraft tasks and applying RL training yields substantial gains, achieving state-of-the-art performance on four agentic benchmarks. The resulting dataset includes 41k tool-intensive tasks across varied difficulty levels, including 12.6k tool executions and 5k sub-task decompositions.
Paper Type: Long
Research Area: Generation
Research Area Keywords: agent, generation, LLM, agentic task
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Previous URL: https://openreview.net/forum?id=0hUaVKAXoc
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: Yes, I want a different area chair for our submission
Reassignment Request Reviewers: Yes, I want a different set of reviewers
Justification For Not Keeping Action Editor Or Reviewers: We hope this manuscript can receive a more thorough evaluation and discussion.
Software: zip
Data: zip
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: 4
B2 Discuss The License For Artifacts: No
B2 Elaboration: There will be licenses in open source code
B3 Artifact Use Consistent With Intended Use: No
B3 Elaboration: We will provide the specific address in the camera-ready version
B4 Data Contains Personally Identifying Info Or Offensive Content: N/A
B5 Documentation Of Artifacts: Yes
B5 Elaboration: 4
B6 Statistics For Data: Yes
B6 Elaboration: 4
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: 4
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: 4
C3 Descriptive Statistics: Yes
C3 Elaboration: 4
C4 Parameters For Packages: N/A
D Human Subjects Including Annotators: Yes
D1 Instructions Given To Participants: No
D1 Elaboration: The manual annotations come from the author team.
D2 Recruitment And Payment: N/A
D3 Data Consent: Yes
D3 Elaboration: 4
D4 Ethics Review Board Approval: Yes
D4 Elaboration: 4
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: No
E1 Elaboration: we only use it for polish paper
Author Submission Checklist: yes
Submission Number: 682
Loading