Latent Hierarchical Imitation Learning for Stochastic Environments

Maximilian Igl; Punit Shah; Paul Mougin; Sirish Srinivasan; Tarun Gupta; Brandyn White; Kyriacos Shiarlis; Shimon Whiteson

Latent Hierarchical Imitation Learning for Stochastic Environments

Maximilian Igl, Punit Shah, Paul Mougin, Sirish Srinivasan, Tarun Gupta, Brandyn White, Kyriacos Shiarlis, Shimon Whiteson

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: hierarchical imitation learning, learning from demonstrations, autonomous driving, causal confusion

TL;DR: We formalize and alleviate challenges in imitation learning when hierachical policies are used to prevent mode collapse.

Abstract: Many applications of imitation learning require the agent to avoid mode collapse and mirrorthe full distribution of observed behaviors. Existing methods improving this distributional realism typically rely on hierarchical policies conditioned on sampled types that model agent-internal features like persona, goal, or strategy. However, these methods are often inappropriate for stochastic environments, where internal and external factors of influence on the observed agent trajectories have to be disentangled, and only internal factors should be encoded in the agent type to be robust to changing environment conditions. We formalize this challenge as distribution shifts in the marginal and conditional distributions of agent types under environmental stochasticity, in addition to the familiar covariate shift in state visitations. We propose Robust Type Conditioning (RTC), which eliminates these shifts with adversarial training under randomly sampled types. Experiments on two domains, including the large-scal eWaymo Open Motion Dataset, show improved distributional realism while maintaining or improving task performance compared to state of the art baselines.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Supplementary Material: zip

16 Replies

Loading