RoboTube: Learning Household Manipulation from Human Videos with Simulated Twin EnvironmentsDownload PDF

30 May 2022, 15:43 (modified: 23 Jun 2022, 18:13)L-DOD 2022 PosterReaders: Everyone
Keywords: Imitation from Videos, Video Demonstration Dataset, Real2Sim, Robotic Simulation Benchmark
Abstract: We aim to build a useful, reproducible, democratized benchmark for learning household robotic manipulation from human videos. To realize this goal, a diverse, high-quality human video dataset curated specifically for robots is desired. To evaluate the learning progress, a simulated twin environment that resembles the appearance and the dynamics of the physical world would help roboticists and AI researchers validate their algorithms convincingly and efficiently before testing on a real robot. Hence, we present RoboTube, a human video dataset, and its digital twins for learning various robotic manipulation tasks. RoboTube video dataset contains 5{,}000 video demonstrations recorded with multi-view RGB-D cameras of human-performing everyday household tasks including manipulation of rigid objects, articulated objects, granular objects, deformable objects, and bimanual manipulation. RT-sim, as the simulated twin environments, consists of 3D scanned, photo-realistic objects, minimizing the visual domain gap between the physical world and the simulated environment. We hope RoboTube can lower the barrier to robotics research for beginners while facilitating reproducible research in the community.
0 Replies