Prospector: Improving LLM Agents with Self-Asking and Trajectory Ranking

Published: 07 Nov 2023, Last Modified: 14 Dec 2023FMDM@NeurIPS2023EveryoneRevisionsBibTeX
Keywords: Large Language Models, LLM Agents, Chain-of-Thoughts, Reward Models
Abstract: Large language models (LLMs) have shown the ability to solve complex decision-making tasks beyond the natural language processing tasks. Current LLM agents such as ReAct can solve interactive decision-making tasks by imitating the few-shot demonstrations given in the prompt. The LLM agents based on few-shot in-context learning (ICL) achieve surprisingly high performance without training. Despite the simplicity and generalizability, the ICL-based approaches lack optimizing trajectories based on the reward from an environment. In this paper, we introduce Prospector, a LLM agent that consists of two complementary LLMs such as the LLM Actor and LLM Critic. To elicit more proper actions from the LLM Actor, we provide AskAct prompting that interleaves additional self-asking steps in the few-shot demonstrations. Furthermore, to take advantages of the stochasticity of LLMs, we provide Trajectory Ranking in which the LLM Actor generates diverse (creative) trajectories at high temperature and the LLM Critic selects the most rewarding trajectory by predicting the expected total reward of each trajectory. On the representative decision-making benchmark environments such as ALFWorld and WebShop, we empirically demonstrate that Prospector can considerably increase the success rate of given tasks, while outperforming recent advancements such as ReAct and Reflexion.
Submission Number: 60