Can Language Agents Approach the Performance of RL? An Empirical Study On OpenAI Gym

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: LLM Agents, Benchmark, Reinforcement Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: The formidable capacity for zero- or few-shot decision-making in language agents encourages us to pose a compelling question: *Can language agents approach the performance of reinforcement learning (RL) in traditional sequential decision-making tasks and exhibit greater efficacy?* To investigate this, we first develop a $\texttt{TextGym}$ simulator by grounding OpenAI Gym in a textual environment. This allows for straightforward comparisons between RL agents and language agents, given the widespread adoption of OpenAI Gym. To ensure a fair and effective benchmarking, we introduce $5$ levels of scenario for accurate domain-knowledge controlling and a unified RL-inspired framework for language agents. Additionally, we propose an innovative explore-exploit-guided language ($\texttt{EXE}$) agent to solve the severely partially observable and sparse reward tasks within $\texttt{TextGym}$. Through numerical experiments and ablation studies, we extract valuable insights into the decision-making capabilities of language agents and evaluate their potential to compete with RL in classical sequential decision-making problems. This paper sheds light on the performance of language agents and paves the way for future research in this exciting domain.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7636
Loading