Abstract: Planning and acting to solve 'real' tasks using large language models (LLMs) in interactive environments has become a new frontier for AI methods. While recent advances allowed LLMs to interact with online tools, solve robotics tasks and many more, long range reasoning tasks remain a problem for LLMs.
Existing methods to address this issue are very resource intensive and require additional data or human crafted rules, instead, we propose a simple method based on few-shot in-context-learning alone to enhance 'chain-of-thought' with state-tracking for planning and acting with LLMs. We show that our method establishes the new state-of-the-art on Alfworld for in-context-learning methods (+14\% over the previous best in-context-learning method) and performs on par with methods that use additional training data and additional tools such as code-execution. We also demonstrate that our enhanced 'chain-of-states' allows the agent to both solve longer horizon problems and to be more efficient in number of steps required to solve a task. Finally, we also conduct ablation studies and show that `chain-of-thoughts' helps state-tracking accuracy, while a json-structure harms overall performance. We open-source our code and annotations at anonymous URL.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: LLM, AI Agent, Acting, State tracking, Planning, In context learning, few shot, goal tracking, long range reasoning
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 3345
Loading