LAM SIMULATOR: Advancing Data Generation for Large Action Models Trainings via Online Exploration and Feedback Simulation

ACL ARR 2025 February Submission5592 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Action Models (LAMs) for AI Agents offer incredible potential but face challenges due to the need for high-quality training data, especially for multi-steps tasks that involve planning, executing tool calls, and responding to feedback. To address these issues, we present LAM SIMULATOR, a comprehensive framework designed for online exploration of agentic tasks with high-quality feedback. Our framework features a dynamic task query generator, an extensive collection of tools, and an interactive environment where Large Language Model (LLM) Agents can call tools and receive real-time feedback. This setup enables LLM Agents to explore and solve tasks independently and potentially come up with multiple approaches to tackle any given task. Generated data are then used to create high-quality training datasets for LAMs. Our research shows that LAM SIMULATOR enables LLM Agents to autonomously solve tasks while automating the creation of high-quality training data. Models trained with these self-generated datasets demonstrated significant performance gains, showing up to a 49.3\% improvement over their own baselines. This was especially evident in experiments conducted with the ToolBench and CRMArena environments. The process requires minimal human input during dataset creation, highlighting the LAM SIMULATOR's efficiency and effectiveness in speeding up AI agents' development.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: llm agent, function calling, data generation, applications, interactive and collaborative generation
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 5592
Loading