Abstract: We study the general setup of learning from demonstration with of goal of building an agent that is capable of imitating a single video of human demonstration to perform the task with novel objects in new scenarios. In order to accomplish this goal, our agent not only should be able to understand the intent of the demonstrated third-person video in its own context, but also be able to perform the intended task with its own environment configuration. Our main insight is to instill the structure6in the learning progress by decoupling what to achieve (intended task) from how to perform it (controller). We learn a hierarchical setup comprising of a high-level module to generate the series of first-person sub-goals conditioned on a third-person video demonstration, and a low-level controller to output actions to achieve those sub-goals. We show results on a real robotic platform using Baxter for manipulation task of pouring and placing objects in a box. The robot videos and demos are available on the project website https://sites.google.com/view/htpi.
CMT Num: 1488
Code Link: https://github.com/pathak22/hierarchical-imitation/
0 Replies
Loading