Vision-based Manipulation from Single Human Video with Open-World Object Graphs

Published: 10 Nov 2024, Last Modified: 10 Nov 2024CoRL-X-Embodiment-WS 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robot manipulation, Imitation from human videos.
TL;DR: We studied how to teach robots to learn a new task via a single, actionless human video demonstration.
Abstract: We present an object-centric approach to empower robots to learn vision-based manipulation skills from human videos. We investigate the problem of imitating robot manipulation from a single human video in the open-world setting, where a robot must learn to manipulate novel objects from one video demonstration. We introduce ORION, an algorithm that tackles the problem by extracting an object-centric manipulation plan from a single RGB-D video and deriving a policy that conditions on the extracted plan. ORION enables the robot to learn from videos captured by daily mobile devices such as an iPad and generalize the policies to deployment environments with varying visual backgrounds, camera angles, spatial layouts, and novel object instances. We systematically evaluate ORION on both short-horizon and long-horizon tasks, demonstrating the efficacy of ORION in learning from a single human video in the open world.
Previous Publication: No
Submission Number: 15
Loading