Abstract: Imitation learning is the problem of teaching an agent to replicate expert policy from demonstrations when the underlying reward function is unavailable. This task becomes particularly challenging when the expert demonstrates a mixture of behaviors, often modeled by a discrete or continuous latent variable.
Prior work has addressed imitation learning in such mixture scenarios by recovering the underlying latent variables, in the context of both supervised learning (behavior cloning), and generative adversarial imitation learning (GAIL). In several robotic locomotion tasks, simulated in the MuJoCo platform, we observe that existing models fail in distinguishing and imitating different modes of behavior in both cases of discrete and continuous latent variables. To address this problem, we introduce a novel generative model for behavior cloning, in a mode-separating manner. We also integrate our model with GAIL, to achieve robustness to the problem of compounding error caused by unseen states. We show that our models outperform the state-of-the-art in aforementioned experiments.
Supplementary Material: zip
4 Replies
Loading