Keywords: Embodied instruction following, Scene-graphs, LLM-based planning
TL;DR: We propose LookPlanGraph -- a novel approach that leverages hierarchical scene graphs and dynamically augments them during task execution.
Abstract: Recently, approaches using Large Language Models as planners for robotic tasks have become widespread. In such systems, the LLM must be grounded in the environment in which the robot is operating in order to successfully complete tasks. One way to do this is to use a scene graph that contains all the information necessary to complete the task, including the presence and location of objects. In this paper, we propose an approach that works with a scene graph containing only immobile static objects, and augments the scene graph with the necessary movable objects during instruction following using a visual language model and an image from the agent's camera. We conduct thorough experiments on the SayPlan Office, BEHAVIOR-1K, and VirtualHome RobotHow datasets, and demonstrate that the proposed approach effectively handles the task, bypassing approaches that use pre-created scene graphs.
Submission Number: 165
Loading