OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation

Published: 05 Sept 2024, Last Modified: 05 Sept 2024CoRL 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Humanoid Manipulation, Imitation From Videos, Motion Retargeting
Abstract: We study the problem of teaching humanoid robots to imitate manipulation skills by watching single human videos. To tackle this problem, we investigate an object-aware retargeting approach, where humanoid robots mimic the human motions in the video while adapting to the object locations during deployment. We introduce OKAMI, an algorithm that generates a reference plan from a single RGB-D video, and derive a policy that follows the plan to complete the task. OKAMI sheds light on deploying humanoid robots in everyday environments, where the humanoid robot will quickly adapt to a new task given a single human video. Our experiments show that OKAMI outperforms the baseline by 58.33%, while showcasing systematic generalization across varying visual and spatial conditions. More videos can be found in supplementary materials and website https://sites.google.com/view/okami-corl2024.
Supplementary Material: zip
Submission Number: 644
Loading