Keywords: Agent, Language Model, Exploration, Data Generation, Self-Evolving, Iterative Feedback, Imitation Learning, Demonstrations
Abstract: Training large language model (LLM) agents to acquire necessary skills and perform diverse tasks within an environment is gaining interest as a means to enable open-endedness.
However, building the training dataset is difficult, as manual trajectory collection is labor-intensive and LLM-proposed tasks are often infeasible.
Moreover, the generated data may not provide a meaningful learning signal, as agents often already perform well on the proposed tasks.
To address this, we propose a novel framework EXIF for LLM-powered agents. This automatic improvement framework is designed to enhance the feasibility of generated target behaviors while accounting for the agents’ capabilities.
Our method adopts an exploration-first strategy by employing an exploration agent (Alice) to train the target agent (Bob) to learn essential skills in the environment.
Specifically, Alice first interacts with the environment to generate a feasible, environment-grounded skill dataset, which is then used to train Bob. Crucially, we incorporate an iterative feedback loop, where Alice evaluates Bob’s performance to identify areas for improvement.
This feedback then guides Alice’s next round of exploration, forming a closed-loop data generation process.
Experiments on Webshop and Crafter demonstrate EXIF’s ability to iteratively expand the capabilities of the trained agent without human intervention, leading to substantial performance improvements.
Interestingly, we observe that setting Alice to the same model as Bob also notably improves performance, demonstrating EXIF’s potential for building a self-improving system.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: Autonomous agents; LLM agents; environment interaction; fine-tuning;
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 7031
Loading