Keywords: Agent, Language Model, Exploration, Data Generation, Self-Evolving, Iterative Feedback, Imitation Learning, Demonstrations
TL;DR: We propose a self-evolving system for LLM agents that combines exploration and iterative feedback to generate feasible, targeted training data, enabling strong performance without human intervention.
Abstract: Training large language model (LLM) agents to acquire necessary skills and perform diverse tasks within an environment is gaining interest as a means to enable open-endedness.
However, creating the training dataset for their skill acquisition faces several challenges.
Manual trajectory collection requires significant human effort.
Another approach, where LLMs directly propose tasks to learn, is often invalid, as the LLMs lack knowledge of which tasks are actually feasible.
Moreover, the generated data may not provide a meaningful learning signal, as agents often already perform well on the proposed tasks.
To address this, we propose a novel framework EXIF for LLM-powered agents. This automatic improvement framework is designed to enhance the feasibility of generated target behaviors while accounting for the agents’ capabilities.
Our method adopts an exploration-first strategy by employing an exploration agent (Alice) to train the target agent (Bob) to learn essential skills in the environment.
Specifically, Alice first interacts with the environment to generate a feasible, environment-grounded skill dataset, which is then used to train Bob. Crucially, we incorporate an iterative feedback loop, where Alice evaluates Bob’s performance to identify areas for improvement.
This feedback then guides Alice’s next round of exploration, forming a closed-loop data generation process.
Experiments on Webshop and Crafter demonstrate EXIF’s ability to iteratively expand the capabilities of the trained agent without human intervention, leading to substantial performance improvements.
Interestingly, we observe that setting Alice to the same model as Bob also notably improves performance, demonstrating EXIF’s potential for building a self-evolving system.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 15731
Loading