Giving Robots a Hand: Broadening Generalization via Hand-Centric Human Video Demonstrations

Moo Jin Kim; Jiajun Wu; Chelsea Finn

Giving Robots a Hand: Broadening Generalization via Hand-Centric Human Video Demonstrations

Moo Jin Kim, Jiajun Wu, Chelsea Finn

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: imitation learning, robotics, manipulation, learning from human demonstrations, learning from observations, generalization, visuomotor control

TL;DR: We leverage hand-centric human video demonstrations to learn generalizable robotic manipulation policies via imitation learning, introducing a simple framework that allows one to avoid using explicit human-robot domain adaptation methods.

Abstract: Videos of humans performing tasks are a promising data source for robotic manipulation, because they are easy to collect in a wide range of scenarios and thus have the potential to significantly expand the generalization capabilities of vision-based robotic manipulators. Prior approaches to learning from human video demonstrations typically use third-person or egocentric data, but a central challenge that must be overcome there is the domain shift caused by the difference in appearance between human and robot morphologies. In this work, we largely reduce this domain gap by collecting hand-centric human video data (i.e., videos captured by a human demonstrator wearing a camera on their arm). To further close the gap, we simply crop out a portion of every visual observation such that the hand is no longer visible. We propose a framework for broadening the generalization of deep robotic imitation learning policies by incorporating unlabeled data in this format---without needing to employ any domain adaptation method, as the human embodiment is not visible in the frame. On a suite of six real robot manipulation tasks, our method substantially improves the generalization performance of manipulation policies acting on hand-centric image observations. Moreover, our method enables robots to generalize to both new environment configurations and new tasks that are unseen in the expert robot imitation data.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)

8 Replies

Loading