PLEX: Making the Most of the Available Data for Robotic Manipulation PretrainingDownload PDF

Published: 30 Aug 2023, Last Modified: 25 Oct 2023CoRL 2023 PosterReaders: Everyone
Keywords: Robot learning, Robotic manipulation, Visuomotor representations
TL;DR: A model architecture for robotic manipulation tailored to the realities of robotic manipulation datasets
Abstract: A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations. In this work we propose PLEX, a transformer-based architecture that learns from a small amount of task-agnostic visuomotor trajectories and a much larger amount of task-conditioned object manipulation videos — a type of data available in quantity. PLEX uses visuomotor trajectories to induce a latent feature space and to learn task-agnostic manipulation routines, while diverse video-only demonstrations teach PLEX how to plan in the induced latent feature space for a wide variety of tasks. Experiments showcase PLEX’s generalization on Meta-World and SOTA performance in challenging Robosuite environments. In particular, using relative positional encoding in PLEX’s transformers greatly helps in low-data regimes of learning from human-collected demonstrations.
Student First Author: yes
Supplementary Material: zip
Instructions: I have read the instructions for authors (
Publication Agreement: pdf
Poster Spotlight Video: mp4
11 Replies