LookPlanGraph: Embodied instruction following method with VLM graph augmentation

Anatoly Onishchenko; Alexey Kovalev; Aleksandr Panov

LookPlanGraph: Embodied instruction following method with VLM graph augmentation

Anatoly Onishchenko, Alexey Kovalev, Aleksandr Panov

Published: 05 Mar 2025, Last Modified: 20 Mar 2025Reasoning and Planning for LLMs @ ICLR2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Embodied instruction following, Scene-graphs, LLM-based planning

TL;DR: We propose LookPlanGraph -- a novel approach that leverages hierarchical scene graphs and dynamically augments them during task execution.

Abstract: Recently, approaches using Large Language Models (LLM) as planners for robotic tasks have become widespread. In such systems, the LLM must be grounded in the environment in which the robot is operating in order to successfully complete tasks. One way to achieve this grounding is to use a scene graph that contains all the information necessary to complete the task, including the presence and location of objects. In this paper, we propose an approach that works with a scene graph containing only immobile static objects, and augments the scene graph with the necessary movable objects during instruction following using a visual language model and an image from the agent's camera. We conduct thorough experiments on the compiled GRASIF dataset that contain tasks from SayPlan Office, Behaviour-1K, and RobotHow datasets, and demonstrate that the proposed approach effectively handles the task, bypassing approaches that use pre-created scene graphs.

Submission Number: 165

Loading