Open-World Object Manipulation using Pre-Trained Vision-Language ModelsDownload PDF

Published: 30 Aug 2023, Last Modified: 24 Oct 2023CoRL 2023 PosterReaders: Everyone
Abstract: For robots to follow instructions from people, they must be able to connect the rich semantic information in human vocabulary, e.g. ``can you get me the pink stuffed whale?'' to their sensory observations and actions. This brings up a notably difficult challenge for robots: while robot learning approaches allow robots to learn many different behaviors from first-hand experience, it is impractical for robots to have first-hand experiences that span all of this semantic information. We would like a robot's policy to be able to perceive and pick up the pink stuffed whale, even if it has never seen any data interacting with a stuffed whale before. Fortunately, static data on the internet has vast semantic information, and this information is captured in pre-trained vision-language models. In this paper, we study whether we can interface robot policies with these pre-trained models, with the aim of allowing robots to complete instructions involving object categories that the robot has never seen first-hand. We develop a simple approach, which we call Manipulation of Open-World Objects (MOO), which leverages a pre-trained vision-language model to extract object-identifying information from the language command and image, and conditions the robot policy on the current image, the instruction, and the extracted object information. In a variety of experiments on a real mobile manipulator, we find that MOO generalizes zero-shot to a wide range of novel object categories and environments. In addition, we show how MOO generalizes to other, non-language-based input modalities to specify the object of interest such as finger pointing, and how it can be further extended to enable open-world navigation and manipulation. The project’s website and evaluation videos can be found at
Student First Author: no
Supplementary Material: zip
Instructions: I have read the instructions for authors (
Publication Agreement: pdf
Poster Spotlight Video: mp4
11 Replies