REX: Rapid Exploration and eXploitation for AI agents

Rithesh R N; Shelby Heinecke; Juan Carlos Niebles; Zhiwei Liu; Le Xue; Weiran Yao; Yihao Feng; Zeyuan Chen; Akash Gokul; Devansh Arpit; Ran Xu; Phil L Mui; Huan Wang; Caiming Xiong; Silvio Savarese

REX: Rapid Exploration and eXploitation for AI agents

Rithesh R N, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil L Mui, Huan Wang, Caiming Xiong, Silvio Savarese

Published: 11 Mar 2024, Last Modified: 22 Apr 2024LLMAgents @ ICLR 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, AI Agents, MCTS

Abstract: AI agents leveraging the capabilities of Large Language Models (LLMs) and Reinforcement Learning (RL) techniques have garnered growing attention due to their commendable performance in autonomously executing real-world tasks. Effective exploration of the action space is paramount for the successful accomplishment of diverse tasks by these AI agents. In this paper, we propose an enhanced approach for $\textbf{R}$apid $\textbf{E}$xploration and e$\textbf{X}$ploitation of action space for LLM-based AI agents, called $\textbf{REX}$. Existing LLM-driven agents have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional RL. To overcome these challenges, REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance. The decision-making process of the agent, which involves predicting the next best action, is influenced by harnessing UCB scores. This approach has the advantage of enabling the utilization of offline behaviors from logs and allowing seamless integration with existing foundation models while it does not require any model fine-tuning.This is made possible because this method does not require model fine-tuning. Through comparative analysis with existing methods such as Chain-of-Thought (CoT) and Reflexion, REX-based methods demonstrate comparable performance and, in certain cases, even surpass the results achieved by these existing techniques. Notably, REX-based methods exhibit remarkable reductions in execution time while systematically exploring the action space of AI agents, enhancing their practical applicability across a diverse set of scenarios.

Submission Number: 20

Loading