MENSA: Leveraging Mental Simulation for In-Context Policy Improvement in LLM Agents
Keywords: Agent Architectures, Large Language Model, In-Context Policy Improvement
TL;DR: We propose Mental Simulation Agent (MENSA) for interactive sequential decision making tasks, which is evaluated on challenging environment ScienceWorld and NetHack, and outperforms the previous state-of-the-art by a large margin.
Abstract: Large Language Model (LLM) powered agents have shown promise in sequential decision-making tasks in interactive environments. However, prior agent frameworks usually rely on advanced LLM capabilities such as planning or instruction following to carry out tasks successfully. Effectively improving the performance of an LLM agent without assuming these capabilities remains challenging. To address this issue, we propose MENtal Simulation Agent (MENSA), a novel model-based approach that enhances LLM-based agents without fine-tuning. MENSA leverages the fundamental ability of any LLMs, text completion, to generate forecasts of action-state pairs (i.e., transitions) for future time steps. These forecasts are used to construct a set of relevant past experiences, which are provided to the LLM agent in context to improve its decision-making behavior. We evaluate MENSA in two challenging interactive environments, ScienceWorld and NetHack, and show that MENSA improves performance across various sizes of LLMs. Using large models (e.g., GPT-4o-mini), MENSA outperforms previous state-of-the-art methods by +15.8 points in ScienceWorld and by +40.0 points in NetHack. Even with smaller models like Phi-3-mini, MENSA achieves a gain of +11.9 points in ScienceWorld. Our results further suggest that MENSA is less affected by an LLM's limitations in instruction-following and planning compared to baselines.
Area: Generative and Agentic AI (GAAI)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 895
Loading