STEER: Bridging VLMs and Low-Level Control for Adaptable Robotic Manipulation

Laura Smith; Alex Irpan; Montserrat Gonzalez Arenas; Sean Kirmani; Dmitry Kalashnikov; Dhruv Shah; Ted Xiao

STEER: Bridging VLMs and Low-Level Control for Adaptable Robotic Manipulation

Laura Smith, Alex Irpan, Montserrat Gonzalez Arenas, Sean Kirmani, Dmitry Kalashnikov, Dhruv Shah, Ted Xiao

Published: 29 Oct 2024, Last Modified: 03 Nov 2024CoRL 2024 Workshop MRM-D OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: generalization, robotic manipulation, language augmentation

TL;DR: We present STEER, a framework that extracts flexible low-level skills from existing datasets that can be combined by humans or VLMs to handle more complex situations without any additional data collection or fine-tuning.

Abstract: Recent advances have showcased the opportunity of leveraging the broad semantic understanding learned by vision-language models (VLMs) in robot learning; however, connecting VLMs effectively to robot control remains an open question since physical robot data is relatively sparse and narrow compared to internet-scale VLM training data. We propose STEER, a system for bridging this gap by learning flexible, low-level manipulation skills that can be modulated or repurposed to adapt to new situations. We show that training low-level learned policies on structured, dense re-annotation of existing robot datasets exposes an intuitive and flexible interface for humans or VLMs to guide them in unfamiliar scenarios or to perform new tasks using common-sense reasoning. We demonstrate the skills learned via STEER can be combined to synthesize novel behaviors to achieve held-out tasks without additional training. Videos at https://steer-anon.github.io/

Submission Number: 37

Loading