HEETR: Pretraining for Robotic Manipulation on Heteromodal Data

Garrett Thomas; Andrey Kolobov; Ching-An Cheng; Vibhav Vineet; Mihai Jalobeanu

HEETR: Pretraining for Robotic Manipulation on Heteromodal Data

Garrett Thomas, Andrey Kolobov, Ching-An Cheng, Vibhav Vineet, Mihai Jalobeanu

Published: 17 Nov 2022, Last Modified: 05 May 2023PRL 2022 PosterReaders: Everyone

Keywords: pretraining, manipulation, multimodal, transformer, imitation learning

TL;DR: We introduce a model and associated pretraining method for robotic manipulation that leverages abundant but relatively uninformative data to improve learning of new tasks from few demonstrations.

Abstract: A good representation is a key to unlock efficient learning for real-world robot manipulation. However, common manipulation-relevant datasets do not always have all the modalities (e.g., videos, actions, proprioceptive states) presented in robotic manipulation. As a result, existing approaches to representation learning, which assume full data modalities, cannot be easily scaled to consume all the data; instead, they can only be applied to a subset of modality sufficient data, which limits the effectiveness of representation learning. In this work, we present an end-to-end transformer-based pretraining method called HEETR (Heteromodal End-to-End Transformer for Robotic manipulation) that can learn a representation for efficient adaptation using all data regardless of their available modalities. We demonstrate the merits of this design and establish new state-of-the-art performance on Robosuite/Robomimic and Meta-World benchmarks.

1 Reply

Loading