DALL-E-Bot: Introducing Web-Scale Diffusion Models to RoboticsDownload PDF

Published: 17 Nov 2022, Last Modified: 05 May 2023PRL 2022 PosterReaders: Everyone
Keywords: Diffusion Models, Image Generation, Object Rearrangement
TL;DR: Use web-scale models like DALL-E to generate an image of a human-preferred goal state for object rearrangement, then achieve that state using a real robot.
Abstract: We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that image. The significance is that we achieve this zero-shot using DALL-E, without needing any further data collection or training. Encouraging real-world results with human studies show that this is a promising direction for the future of web-scale robot learning. We also propose a list of recommendations to the text-to-image community, to align further developments of these models with applications to robotics. Videos are available on our webpage at: https://www.robot-learning.uk/dall-e-bot
1 Reply

Loading