OPEx: A Large Language Model-Powered Framework for Embodied Instruction Following

Haochen Shi; Zhiyuan Sun; Xingdi Yuan; Marc-Alexandre Côté; Bang Liu

OPEx: A Large Language Model-Powered Framework for Embodied Instruction Following

Haochen Shi, Zhiyuan Sun, Xingdi Yuan, Marc-Alexandre Côté, Bang Liu

Published: 01 Jan 2024, Last Modified: 15 May 2025AAMAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Embodied Instruction Following (EIF) is crucial for understanding natural language in a practical context, requiring agents to follow verbal instructions for complex tasks. Traditionally, EIF relies heavily on expert annotations for learning, which are costly and sometimes unattainable. Recent research shows Large Language Models (LLMs) can use their reasoning ability to help in EIF with minimal examples, but applying LLMs directly faces issues like hallucinations and partially observable environment. To bridge the gap, we introduce OPEx, a new LLM-based method for EIF that needs far less specific data. OPEx uses three LLMs for different roles: observing to gather environment data, planning by breaking down instructions, and executing tasks with learned skills. Our tests reveal OPEx significantly outperforms the FILM baseline, with 90% less training data for planning tasks and achieving up to 38% performance gain when FILM is trained on identical data.

Loading