STREAM: Embodied Reasoning through Code Generation

Daniil Cherniavskii; Phillip Lippe; Andrii Zadaianchuk; Efstratios Gavves

STREAM: Embodied Reasoning through Code Generation

Daniil Cherniavskii, Phillip Lippe, Andrii Zadaianchuk, Efstratios Gavves

Published: 18 Jun 2024, Last Modified: 05 Sept 2024MFM-EAI@ICML2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: embodied ai, embodied question answering, code generation, llm, vlm

TL;DR: LLM-based agent generate code to answer questions about the embodied experience and environment using an API and other models, such as VLMs.

Abstract: Recent advancements in the reasoning and code generation abilities of Large Language Models (LLMs) have provided new perspectives on Embodied AI tasks, enhancing planning for both high-level control problems and low-level manipulation. However, efficiently informing the embodied agent about the environment in a concise and task-specific manner remains a challenge. Inspired by modular visual reasoning, we propose a novel approach that utilizes code generation to ground the planner in the environmental context and enable reasoning about past agent experiences. Our modular framework allows the code-generating LLM to extract and aggregate information from relevant observations via API calls to image understanding models, including flexible VLMs. To evaluate our approach, we choose Embodied Question Answering (EQA) as a target task and develop a procedure for synthetic data collection by utilizing the ground truth states of a simulator. Our framework demonstrates notable improvements over baseline methods.

Submission Number: 13

Loading