- Keywords: imitation learning, generative models, vision, POMDP, self-supervised reinforcement learning
- TL;DR: We train a distribution-matching imitation-learning algorithm using variational models of image-based environments.
- Abstract: We consider the problem setting of imitation learning where the agent is provided a fixed dataset of demonstrations. While the agent can interact with the environment for exploration, it is oblivious to the reward function used by the demonstrator. This setting is representative of many applications in robotics where task demonstrations may be straightforward while reward shaping or conveying stylistic aspects of human motion may be difficult. For this setting, we develop a variational model-based imitation learning algorithm (VMIL) that is capable of learning policies from visual observations. Through experiments, we find that VMIL is more sample efficient compared to prior algorithms in several challenging vision-based locomotion and manipulation tasks, including a high-dimensional in-hand dexterous manipulation task.