AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents

AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents

ICLR 2026 Conference Submission13912 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Synthetic data, Computer-use agents, Scalable

TL;DR: We present AgentSynth, a scalable pipeline that automatically generates diverse and realistic computer-use tasks and trajectories.

Abstract: We introduce AgentSynth, a scalable and cost-efficient pipeline for automatically synthesizing high-quality tasks and trajectory datasets for generalist computer-use agents. Leveraging information asymmetry, AgentSynth constructs subtasks that are simple during generation but significantly more challenging when composed into long-horizon tasks, enabling the creation of over 6,000 diverse and realistic tasks. A key strength of AgentSynth is its ability to precisely modulate task complexity by varying the number of subtasks. Empirical evaluations show that state-of-the-art LLM agents suffer a steep performance drop, from 18\% success at difficulty level 1 to just 4\% at level 6, highlighting the benchmark's difficulty and discriminative power. Moreover, our pipeline achieves a low average cost of \$0.60 per trajectory, orders of magnitude cheaper than human annotations. Code is available in the supplementary materials.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 13912

Loading