Trial and Error: Exploration-Based Trajectory Optimization of LLM AgentsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Large language models (LLMs) have emerged as the core controller for various autonomous agent systems. In this work, we introduce ETO, a method aimed at enhancing the capabilities of open-source LLM agents. Unlike previous work that solely trains on success expert trajectories, our approach enables agents to learn from exploration failures, leading to improved performance through an iterative exploration-training framework. During the exploration phase, the agent explores the environment, collecting failure trajectories to construct contrastive trajectory pairs. In the training phase, the agent leverages the trajectory contrastive information to update its policy. This iterative process of exploration and training facilitates further improvement for the agents. Experiments on three agent datasets show our method consistently outperforms baselines by more than 5% in final rewards. Moreover, analysis on task-solving efficiency, and the potential in scenarios without expert trajectory further highlight the effectiveness of our method.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: NLP engineering experiment
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview