Robi Butler: Multimodal Remote Interaction with Household Robotic Assistants

Published: 16 Apr 2024, Last Modified: 02 May 2024MoMa WS 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Human-Robot Interaction, Mobile Manipulation, Service Robotics, Remote Interaction, Augmented Reality
Abstract: In this paper, we introduce Robi Butler, a novel household robotic system that enables multimodal interaction with the user. Leveraging advanced communication interfaces, Robi Butler enables users to monitor the robot's status, give text/voice instruction, and select target objects with hand pointing. At the core of our robotic system are the high-level behavior module powered by Large Language Models (LLMs) that interpret received multimodal instructions to generate plans, and open-vocabulary primitives supported by the Vision-Language Models (VLMs) for executing the planned actions with text and pointing queries. The integration of the above components allows Robi Butler to ground remote multimodal instruction in the real-world home environment in a zero-shot manner. We demonstrate the efficacy and efficiency of this system with a variety of daily household tasks involving remote users, such as question answering via interactive mobile manipulation, and object disambiguation for manipulation through gesture.
Submission Number: 12
Loading