Keywords: Large language models, Agent, Optimal decision-making
TL;DR: We achieve zero-shot optimal decision-making for LLM agents by integrating the respective advantages of LLMs and RL.
Abstract: Current Large language model (LLM) agents succeed in making zero-shot decisions but struggle to make optimal decisions, as they rely on pre-trained probabilities rather than maximizing expected future rewards. In contrast, agents trained via reinforcement learning (RL) could make optimal decisions but require extensive data. We develop an algorithm that combines the zero-shot capabilities of LLMs with the optimization of RL, referred to as the Model-based LLM Agent with Q-Learning (MLAQ). MLAQ employs Q-learning to derive optimal policies from transitions within memory. Unlike RL agents, MLAQ constructs an LLM-based imagination space, where a UCB variant generates imaginary data through interactions with the LLM-based world model to derive zero-shot policies. This approach achieves a sub-linear regret bound, as guaranteed by our theorem. Moreover, MLAQ employs a mixed-examination mechanism to further enhance the quality of imaginary data. We evaluate MLAQ on benchmarks that present significant challenges for existing LLM agents. Results show that MLAQ achieves a optimal rate of over 90\% in tasks where other methods struggle to succeed. Additional experiments are conducted to reach the conclusion that introducing model-based RL into LLM agents shows significant potential in optimal decision-making. Our website is available at http://mlaq.site/.
Submission Number: 26
Loading