Keywords: Foundation Models, Human-Robot Interaction, Model Learning
TL;DR: MOSAIC is a modular architecture that enables multiple home robots to collaboratively cook with humans.
Abstract: We present MOSAIC, a modular architecture for coordinating multiple robots to (a) interact with users using natural language and (b) manipulate an open vocabulary of everyday objects. At several levels, MOSAIC employs modularity: it leverages multiple large-scale pre-trained models for high-level tasks like language and image recognition, while using streamlined modules designed for low-level task-specific control. This decomposition allows us to reap the complementary benefits of foundation models and precise, more specialized models, enabling our system to scale to complex tasks that involve coordinating multiple robots and humans. First, we unit-test individual modules with 180 episodes of visuomotor picking, 60 episodes of human motion forecasting, and 46 online user evaluations of the task planner. We then extensively evaluate MOSAIC with 60 end-to-end trials. We discuss crucial design decisions, limitations of the current system, and open challenges in this domain
Supplementary Material: zip
Spotlight Video: mp4
Video: https://youtu.be/jKp-RqNlW90
Website: https://portal-cornell.github.io/MOSAIC/
Code: https://github.com/portal-cornell/MOSAIC/
Publication Agreement: pdf
Student Paper: yes
Submission Number: 480
Loading