Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards

Irmak Guzey; Yinlong Dai; Georgy Savva; Raunaq Bhirangi; Lerrel Pinto

Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards

Irmak Guzey, Yinlong Dai, Georgy Savva, Raunaq Bhirangi, Lerrel Pinto

Published: 26 Oct 2024, Last Modified: 10 Nov 2024LFDMEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Human-to-Robot; Dexterous Manipulation

TL;DR: We present HuDOR, a technique that enables multi-fingered robot hands to autonomously learn tasks from human videos by using object-oriented rewards, allowing for online fine-tuning and efficient policy transfer despite morphological differences.

Abstract: Training robots directly from human videos is an emerging area in robotics and computer vision. While there has been notable progress with two-fingered grippers, learning autonomous tasks without teleoperation remains a difficult problem for multi-fingered robot hands. A key reason for this difficulty is that a policy trained on human hands may not directly transfer to a robot hand with a different morphology. In this work, we present HuDOR, a technique that enables online fine-tuning of the policy by constructing a reward function from the human video. Importantly, this reward function is built using object-oriented rewards derived from off-the-shelf point trackers, which allows for meaningful learning signals even when the robot hand is in the visual observation, while the human hand is used to construct the reward. Given a single video of human solving a task, such as gently opening a music box, HuDOR allows our four-fingered Allegro hand to learn this task with just an hour of online interaction. Our experiments across four tasks, show that HuDOR outperforms alternatives with an average of 4$\times$ improvement.

Spotlight Video: mp4

Submission Number: 3

Loading