GHIL-Glue: Hierarchical Control with Filtered Subgoal Images

Kyle Beltran Hatch; Ashwin Balakrishna; Oier Mees; Suraj Nair; Seohong Park; Blake Wulfe; Masha Itkina; Benjamin Eysenbach; Sergey Levine; Thomas Kollar; Benjamin Burchfiel

GHIL-Glue: Hierarchical Control with Filtered Subgoal Images

Kyle Beltran Hatch, Ashwin Balakrishna, Oier Mees, Suraj Nair, Seohong Park, Blake Wulfe, Masha Itkina, Benjamin Eysenbach, Sergey Levine, Thomas Kollar, Benjamin Burchfiel

Published: 29 Oct 2024, Last Modified: 03 Nov 2024CoRL 2024 Workshop MRM-D PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Hierarchical Imitation Learning, Image Generation, Video Prediction, Robot Learning

TL;DR: A method to improve hierarchical imitation learning methods that use pre-trained image or video generative models.

Abstract: Image and video generative models that are pre-trained on Internet scale data can increase the generalization capacity of robot learning systems. These models can function as high-level planners, generating intermediate subgoals for low-level goal-conditioned policies to reach. However, the performance of these systems can be bottlenecked by the interface between generative models and low-level controllers. Generative models may predict photorealistic yet physically infeasible frames. Low-level policies may also be sensitive to subtle visual artifacts in generated goal images. This paper addresses these facets of generalization, providing an interface to “glue together” language-conditioned image or video prediction models with low-level goal-conditioned policies. Our method, Generative Hierarchical Imitation Learning-Glue (GHIL-Glue), filters out subgoals that do not lead to task progress and improves the robustness of goal-conditioned policies to generated subgoals with harmful visual artifacts. GHIL-Glue achieves a new state-of-the-art on the CALVIN simulation benchmark for policies using observations from a single RGB camera. GHIL-Glue also outperforms other generalist robot policies across 3/4 language-conditioned manipulation tasks testing zero-shot generalization on a physical robot.

Submission Number: 42

Loading