CUES: Bottom-Up Exploration and Top-Down Guidance for Agentic Data Synthesis

11 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Agentic RL; Data Synthesis
Abstract: Training LLM-based agents with reinforcement learning (RL) in complex environments requires high-quality, environment-specific data. However, generating tasks that are semantically coherent, behaviorally valid, and executable is prohibitively expensive, making the scarcity of such data a fundamental bottleneck for scaling capable agents. Existing synthesis methods struggle to balance high-level intent with environmental grounding, often producing either unexecutable instructions or aimless, low-quality trajectories. To address this dilemma, we propose \textbf{CuES}, a \textbf{Cu}riosity-driven and \textbf{E}nvironment-grounded framework for agentic data \textbf{S}ynthesis that operates without predefined queries. CuES first uses curiosity-driven exploration to uncover a foundation of fundamentally solvable interaction patterns, ensuring executability by design. Concurrently, top-down guidance expand exploration and task diversity while keeping generated tasks aligned with user intentions. Experiments on AppWorld, BFCL, and WebShop show that CuES generates diverse, executable, and high-quality training tasks, achieving or surpassing the diversity and effectiveness of manually curated datasets and delivering strong downstream RL performance, which makes it possible to train environment-specific agents cost-effectively and efficiently. The code is available at \url{https://github.com/Anonymize-Author/CuES}.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 3951
Loading