SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task PlanningDownload PDF

Published: 30 Aug 2023, Last Modified: 03 Jul 2024CoRL 2023 OralReaders: Everyone
Keywords: robot task planning, large language models, semantic search, LLM-based planning, 3D scene graphs
TL;DR: Combining large language models and 3D Scene Graphs for scalable robotic task planning in expansive multi-floor and multi-room environments, using semantic search and iterative re-planning.
Abstract: Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for robotics. We introduce SayPlan, a scalable approach to LLM-based, large-scale task planning for robotics using 3D scene graph (3DSG) representations. To ensure the scalability of our approach, we: (1) exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a "semantic search" for task-relevant subgraphs from a smaller, collapsed representation of the full graph; (2) reduce the planning horizon for the LLM by integrating a classical path planner and (3) introduce an "iterative replanning" pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures. We evaluate our approach on two large-scale environments spanning up to 3 floors and 36 rooms with 140 assets and objects and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, and natural language instruction for a mobile manipulator robot to execute. We provide real robot video demonstrations on our project page
Student First Author: no
Supplementary Material: zip
Instructions: I have read the instructions for authors (
Publication Agreement: pdf
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](
8 Replies