RoboTube: Learning Household Manipulation from Human Videos with Simulated Twin Environments

Haoyu Xiong; Haoyuan Fu; Jieyi Zhang; Qiang Zhang; Chen Bao; Yongxi Huang; Wenqiang Xu; Animesh Garg; Huazhe Xu; Cewu Lu

RoboTube: Learning Household Manipulation from Human Videos with Simulated Twin Environments

Haoyu Xiong, Haoyuan Fu, Jieyi Zhang, Qiang Zhang, Chen Bao, Yongxi Huang, Wenqiang Xu, Animesh Garg, Huazhe Xu, Cewu Lu

Published: 23 Jun 2022, Last Modified: 02 Mar 2025L-DOD 2022 PosterReaders: Everyone

Keywords: Imitation from Videos, Video Demonstration Dataset, Real2Sim, Robotic Simulation Benchmark

Abstract: We aim to build a useful, reproducible, democratized benchmark for learning household robotic manipulation from human videos. To realize this goal, a diverse, high-quality human video dataset curated specifically for robots is desired. To evaluate the learning progress, a simulated twin environment that resembles the appearance and the dynamics of the physical world would help roboticists and AI researchers validate their algorithms convincingly and efficiently before testing on a real robot. Hence, we present RoboTube, a human video dataset, and its digital twins for learning various robotic manipulation tasks. RoboTube video dataset contains 5{,}000 video demonstrations recorded with multi-view RGB-D cameras of human-performing everyday household tasks including manipulation of rigid objects, articulated objects, granular objects, deformable objects, and bimanual manipulation. RT-sim, as the simulated twin environments, consists of 3D scanned, photo-realistic objects, minimizing the visual domain gap between the physical world and the simulated environment. We hope RoboTube can lower the barrier to robotics research for beginners while facilitating reproducible research in the community.

0 Replies

Loading