VULCAN: Where Agents Learn by Living in Simulated Tool Environments

Published: 02 Mar 2026, Last Modified: 01 Apr 2026ICLR 2026 Workshop DATA-FMEveryoneRevisionsCC BY 4.0
Keywords: Synthetic Data Generation, Agentic Environments, Tool Learning, LLM-agents
Abstract: Large Language Model (LLM) agents perform well in narrow tool-use settings but struggle to generalize across diverse environments due to the lack of high-quality training data. Existing data collection methods rely on manual environment setup and access to live systems, making them labor-intensive and difficult to scale. To address this, we propose VULCAN, a three-phase framework that automatically constructs executable and deterministic tool-use environments directly from tool schemas, generates diverse task variants, and collects high-fidelity agent trajectories by running LLM agents in these simulated environments. Using VULCAN, we simulate 14 environments and generate 78K high-quality training examples from only 232 tools. We conduct extensive experiments across 4 benchmarks and 2 model families, for a total of 5 models, in which we fine-tune Qwen2-3 and Phi-series models on synthesized data. Our approach yields consistent improvements of 12.2%, 10.5%, and 15.8% across evaluation settings over the base versions of these models. The resulting models outperform strong baselines such as GPT-5.2 and approach the performance of Kimi-K2, a significantly larger frontier model. These results demonstrate that VULCAN provides a scalable, domain-agnostic approach for improving the reasoning and generalization capabilities of tool-using LLM agents.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 142
Loading