MOSAIC: A Modular System for Assistive and Interactive Cooking

Published: 16 Apr 2024, Last Modified: 02 May 2024MoMa WS 2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Foundation Models, Human-Robot Interaction, Robot Learning
TL;DR: MOSAIC is a modular architecture that enables multiple home robots to collaboratively cook with humans.
Abstract: We present MOSAIC, a modular architecture for home robots to perform complex collaborative tasks, such as cooking with everyday users. MOSAIC tightly collaborates with humans, interacts with users using natural language, coordinates multiple robots, and manages an open vocabulary of everyday objects. At its core, MOSAIC employs modularity: it leverages multiple large-scale pre-trained models for general tasks like language and image recognition, while using streamlined modules designed for task-specific control. We extensively evaluate MOSAIC on 60 end-to-end trials where two robots collaborate with a human user to cook a combination of recipes. We also extensively test individual modules with 180 episodes of visuomotor picking, 60 episodes of human motion forecasting, and 46 online user evaluations of the task planner. We show that MOSAIC is able to efficiently collaborate with humans, interpret and execute complex tasks, and adapt to new tasks with minimal reconfiguration. Finally, we discuss limitations of the current system and exciting open challenges in this domain. The project's website is at https://portal.cs.cornell.edu/MOSAIC/.
Submission Number: 2
Loading