Plan Diffuser: Grounding LLM Planners with Diffusion Models for Robotic Manipulation

S P Sharan; Ruihan Zhao; ufuk topcu; Zhangyang Wang; Sandeep P. Chinchali

Plan Diffuser: Grounding LLM Planners with Diffusion Models for Robotic Manipulation

S P Sharan, Ruihan Zhao, ufuk topcu, Zhangyang Wang, Sandeep P. Chinchali

Published: 03 Nov 2023, Last Modified: 10 Jan 2024CRL_WS OralEveryoneRevisionsBibTeX

Keywords: Closed Loop Planning, Large Language Models, Diffusion Models

TL;DR: Plan Diffuser is a closed-loop planner that actively conditions upon the visual state of the environment throughout the planning process to enhance its contextual awareness and strengthen its grounding.

Abstract: Embodied AI is progressively exploring large language models (LLMs) for effective planning in robotics. Recent advancements in embodied AI have enabled LLMs to deconstruct a visual observation and a high-level goal prompt into executable sub-tasks. However, these existing methods often perform planning entirely based on the initial state of the environment, leading to a weakened grounding when generating longer plans. Some recent directions of research explore closing the loop through through incorporation of environmental feedback in the form of language. Unlike these methods, we introduce Plan Diffuser, a novel "closed-loop" approach for step-by-step planning with visual feedback accompanied at each step of the loop. Specifically, our method autoregressively employs an LLM to generate single-step text subgoals and a diffusion model to translate these into visual subgoals which are used for subsequent planning. Finally, a goal-conditioned policy capable of realizing these sub-goal images into robotic control actions executes them. Comprehensive evaluations on the Ravens benchmark suite reveal that Plan Diffuser surpasses state-of-the-art methods, particularly in long-horizon tasks. Furthermore, our approach demonstrates robust generalization capabilities in out-of-distribution scenarios -- handling unseen colors, objects, and increased task complexity with ease. We look forward to open-sourcing our code upon acceptance.

Submission Number: 10

Loading