Keywords: Dexterous Manipulation, Human-to-robot Learning, Reinforcement Learning
TL;DR: HuDOR closes human-to-robot hand embodiment gap using online RL with object centric rewards.
Abstract: Training robots directly from human videos is an emerging area in robotics and computer vision. While there has been notable progress with two-fingered grippers, learning autonomous tasks without teleoperation remains a difficult problem for multi-fingered robot hands. A key reason for this difficulty is that a policy trained on human hands may not directly transfer to a robot hand with a different morphology. In this work, we present HuDOR, a technique that enables online fine-tuning of the policy by constructing a reward function from the human video. Importantly, this reward function is built using object-oriented rewards derived from off-the-shelf point trackers, which allows for meaningful learning signals even when the robot hand is in the visual observation, while the human hand is used to construct the reward. Given a single video of human solving a task, such as gently opening a music box, HuDOR allows our four-fingered Allegro hand to learn this task with just an hour of online interaction. Our experiments across four tasks, show that HuDOR outperforms alternatives with an average of 4x improvement. Code and videos are available on our website object-rewards.github.io.
Submission Number: 15
Loading