CUES: Bottom-Up Exploration and Top-Down Guidance for Agentic Data Synthesis

Xinji Mai; Yunpeng Zhai; Ziqian Chen; Cheng Chen; Anni Zou; Shuchang Tao; Zhaoyang Liu; Bolin Ding

CUES: Bottom-Up Exploration and Top-Down Guidance for Agentic Data Synthesis

Xinji Mai, Yunpeng Zhai, Ziqian Chen, Cheng Chen, Anni Zou, Shuchang Tao, Zhaoyang Liu, Bolin Ding

11 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agentic RL; Data Synthesis

Abstract: Training LLM-based agents with reinforcement learning (RL) in complex environments requires high-quality, environment-specific data. However, generating tasks that are semantically coherent, behaviorally valid, and executable is prohibitively expensive, making the scarcity of such data a fundamental bottleneck for scaling capable agents. Existing synthesis methods struggle to balance high-level intent with environmental grounding, often producing either unexecutable instructions or aimless, low-quality trajectories. To address this dilemma, we propose \textbf{CuES}, a \textbf{Cu}riosity-driven and \textbf{E}nvironment-grounded framework for agentic data \textbf{S}ynthesis that operates without predefined queries. CuES first uses curiosity-driven exploration to uncover a foundation of fundamentally solvable interaction patterns, ensuring executability by design. Concurrently, top-down guidance expand exploration and task diversity while keeping generated tasks aligned with user intentions. Experiments on AppWorld, BFCL, and WebShop show that CuES generates diverse, executable, and high-quality training tasks, achieving or surpassing the diversity and effectiveness of manually curated datasets and delivering strong downstream RL performance, which makes it possible to train environment-specific agents cost-effectively and efficiently. The code is available at \url{https://github.com/Anonymize-Author/CuES}.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 3951

Loading