Abstract: Large language models (LLMs) have emerged as the core controller for various autonomous agent systems. In this work, we introduce ETO, a method aimed at enhancing the capabilities of open-source LLM agents. Unlike previous work that solely trains on success expert trajectories, our approach enables agents to learn from exploration failures, leading to improved performance through an iterative exploration-training framework. During the exploration phase, the agent explores the environment, collecting failure trajectories to construct contrastive trajectory pairs. In the training phase, the agent leverages the trajectory contrastive information to update its policy. This iterative process of exploration and training facilitates further improvement for the agents. Experiments on three agent datasets show our method consistently outperforms baselines by more than 5% in final rewards. Moreover, analysis on task-solving efficiency, and the potential in scenarios without expert trajectory further highlight the effectiveness of our method.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: NLP engineering experiment
Languages Studied: English
0 Replies
Loading