Keywords: hierarchical imitation learning, learning from demonstrations, autonomous driving, causal confusion
TL;DR: We formalize and alleviate challenges in imitation learning when hierachical policies are used to prevent mode collapse.
Abstract: Many applications of imitation learning require the agent to avoid mode collapse and mirrorthe full distribution of observed behaviors. Existing methods improving this distributional realism typically rely on hierarchical policies conditioned on sampled types that model agent-internal features like persona, goal, or strategy. However, these methods are often inappropriate for stochastic environments, where internal and external factors of influence on the observed agent trajectories have to be disentangled, and only internal factors should be encoded in the agent type to be robust to changing environment conditions. We formalize this challenge as distribution shifts in the marginal and conditional distributions of agent types under environmental stochasticity, in addition to the familiar covariate shift in state visitations. We propose Robust Type Conditioning (RTC), which eliminates these shifts with adversarial training under randomly sampled types. Experiments on two domains, including the large-scal eWaymo Open Motion Dataset, show improved distributional realism while maintaining or improving task performance compared to state of the art baselines.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
Supplementary Material: zip
16 Replies
Loading