Keywords: Imitation Learning, Human Demonstrations, 3D Generative Foundational Model, Procedural Simulation
TL;DR: Parsing a single human demonstration to generate simulatable assets for creating an imitation learning dataset.
Abstract: Imitation learning is a common paradigm for teaching robots new tasks. However, collecting robot demonstrations through teleoperation or kinesthetic teaching can be tedious and time-consuming, slowing down training data collection for policy learning. On the other hand, while transfer to the robot can be non-trivial, directly demonstrating a task using our human embodiment is much easier, and data is available in abundance. In this work, we propose Real2Gen to train a manipulation policy from a single human demonstration. Real2Gen extracts required information from the demonstration, transfers it to a simulation environment, where a programmable expert agent can demonstrate the task arbitrarily many times, generating an unlimited amount of data to train a flow matching policy. We evaluate Real2Gen on human demonstrations from three different real-world tasks and compare it to a recent baseline. Real2Gen shows an average increase in the success rate of 26.6% and better generalization of the trained policy due to the abundance and diversity of training data. We make the data, code, and trained models publicly available at real2gen.cs.uni-freiburg.de.
Supplementary Material: pdf
Submission Number: 16
Loading