Keywords: large language model, LLM-based agent, self-improvement, evaluation
TL;DR: We introduce AgentGym, an interactive framework with diverse scenarios for developing LLM-based agent. It also includes expanded instructions, trajectories, and benchmark. We explore agent self-evolution across environments with AgentEvol method.
Abstract: Large language models (LLMs), with their generalized capabilities, are considered as a promising foundation to build generally-capable agents that can handle multi-turn decision-making tasks across various interactive environments. Previous attempts typically gather expert-provided trajectories and have LLM-based agents imitate these trajectories step-by-step. However, this supervised fine-tuning approach depends heavily on human supervision, limiting scalability and restricting the agent's exploration and learning in the environments. In this paper, we take the first step towards developing generally-capable LLM-based agents that can explore and evolve themselves across diverse environments. To achieve this, we identify a trinity of ingredients: 1) diverse interactive environments for agent exploration, 2) a trajectory set to equip agents with basic capabilities and prior knowledge, and 3) an effective and scalable approach for agent improvement across environments. We propose AgentGym, a new interactive framework featuring various real-world scenarios and environments for broad, unified, real-time, and concurrent agent exploration. AgentGym also includes a database with expanded instructions, high-quality trajectories, and a benchmark suite. Next, we investigate the potential of agent self-evolution across various environments with a derived exploration-learning method named AgentEvol. Experimental results show that the evolved agents can achieve results comparable to SOTA models. We will release the code, dataset, benchmark, and checkpoints.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11260
Loading