EgoMimic: Scaling Imitation Learning via Egocentric Video

Published: 26 Oct 2024, Last Modified: 10 Nov 2024LFDMEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Imitation Learning, Bimanual manipulation, learning from human videos
Abstract: The scale and diversity of demonstration data required for imitation learning is a significant challenge. We present Egomimic, a full-stack framework that scales manipulation through egocentric-view human demonstrations. Egomimic achieves this through: (1) an ergonomic human data collection system using the Project Aria glasses, (2) a low-cost bimanual manipulator that minimizes the kinematic gap to human data, (3) cross-domain data alignment techniques, and (4) an imitation learning architecture that co-trains on hand and robot data. Compared to prior works that only extract high-level intent from human videos, our approach treats human and robot data equally as embodied demonstration data and learns a unified policy from both data sources. Egomimic achieves significant improvement on a diverse set of long-horizon, single-arm and bimanual manipulation tasks over state-of-the-art imitation learning methods and enables generalization to entirely new scenes. Finally, we show a favorable scaling trend for Egomimic, where adding 1 hour of additional hand data is significantly more valuable than 1 hour of additional robot data. Videos available at https://ego-mimic.github.io
Submission Number: 12
Loading