VULCAN: Where Agents Learn by Living in Simulated Tool Environments

Published: 02 Mar 2026, Last Modified: 02 Mar 2026ICLR 2026 Workshop DATA-FMEveryoneRevisionsCC BY 4.0
Keywords: Synthetic Data Generation, Agentic Environments, Tool Learning, LLM-agents
Abstract: Large Language Model (LLM) agents often excel in narrow tool-use settings but struggle to generalize across diverse environments due to a scarcity of scalable, high-quality training data. Traditional data collection relies on manual environment setup and live-system access, which is labor-intensive and unscalable. To address this, we propose VULCAN, a framework with three phases that automatically synthesizes executable and deterministic tool-use environments directly from given tool schemas. VULCAN generates diverse task variants and extracts high-fidelity agent trajectories by executing an LLM-based agent in these simulated environments. Using this approach, we simulate eight distinct domains from the Berkeley Function-Calling Leaderboard (BFCL) benchmark and synthesize around 33K training examples from a seed of only 129 tools. We then fine-tune Phi4-14B on this synthetic data, achieving a 12.2% improvement on the BFCL evaluation over the base model. The resulting VULCAN-tuned model outperforms the strong baseline o4-mini and comes close to matching the performance of Claude-4.5-Opus, a much larger frontier model. These results suggest that VULCAN provides a scalable, domain-agnostic pathway to improve the reasoning and generalization capabilities of tool-using language agents.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 142
Loading