Keywords: Home Appliance Operation, Structured Model for Decision Making, Foundation Models for Robotics
TL;DR: Teaching robots to accurately operate home appliances by reading user manuals
Abstract: Operating home appliances, among the most common tools in every
household, is a critical capability for assistive home robots. This paper presents
ApBot, a robot system that operates novel household appliances by “reading” their
user manuals. ApBot faces multiple challenges: (i) infer goal-conditioned partial
policies from their unstructured, textual descriptions in a user manual document,
(ii) ground the policies to the appliance in the physical world, and (iii) execute
the policies reliably over potentially many steps, despite compounding errors. To
tackle these challenges, ApBot constructs a structured, symbolic model of an appliance from its manual, with the help of a large vision-language model (VLM). It
grounds the symbolic actions visually to control panel elements. Finally, ApBot
closes the loop by updating the model based on visual feedback. Our experiments show that across a wide range of simulated and real-world appliances, ApBot achieves consistent and statistically significant improvements in task success
rate, compared with state-of-the-art large VLMs used directly as control policies.
These results suggest that a structured internal representations plays an important
role in robust robot operation of home appliances, especially, complex ones.
Supplementary Material: zip
Spotlight: mp4
Submission Number: 570
Loading