RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

Published: 26 Jun 2024, Last Modified: 09 Jul 2024DGR@RSS2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Visual Imitation Learning, Goal-Conditioned Manipulation
TL;DR: We introduce a novel framework for goal-conditioned imitation learning based on hand-drawn sketches, which offer convenience without compromising on goal specificity.
Abstract: Natural language and images are commonly used as goal representations in goal-conditioned imitation learning. However, language can be ambiguous and images can be over-specified. In this work, we study hand-drawn sketches as a modality for goal specification. Sketches can be easy to provide on the fly like language, but like images they can also help a downstream policy to be spatially-aware. By virtue of being minimal, sketches can further help disambiguate task-relevant from irrelevant objects. We present RT-Sketch, a goal-conditioned policy for manipulation that takes a hand-drawn sketch of the desired scene as input, and outputs actions. We train RT-Sketch on a dataset of trajectories paired with synthetically generated goal sketches. We evaluate this approach on six manipulation skills involving tabletop object rearrangements on an articulated countertop. Experimentally we find that RT-Sketch performs comparably to image or language-conditioned agents in straightforward settings, while achieving greater robustness when language goals are ambiguous or visual distractors are present. Additionally, we show that RT-Sketch handles sketches with varied levels of specificity, ranging from minimal line drawings to detailed, colored drawings.
Supplementary Material: zip
Submission Number: 20
Loading