Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents

Anonymous

Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Large language models (LLMs) have emerged as the core controller for various autonomous agent systems. In this work, we introduce ETO, a method aimed at enhancing the capabilities of open-source LLM agents. Unlike previous work that solely trains on success expert trajectories, our approach enables agents to learn from exploration failures, leading to improved performance through an iterative exploration-training framework. During the exploration phase, the agent explores the environment, collecting failure trajectories to construct contrastive trajectory pairs. In the training phase, the agent leverages the trajectory contrastive information to update its policy. This iterative process of exploration and training facilitates further improvement for the agents. Experiments on three agent datasets show our method consistently outperforms baselines by more than 5% in final rewards. Moreover, analysis on task-solving efficiency, and the potential in scenarios without expert trajectory further highlight the effectiveness of our method.

Paper Type: long

Research Area: Machine Learning for NLP

Contribution Types: NLP engineering experiment

Languages Studied: English

0 Replies

Loading