Bridging the Imitation Gap by Adaptive Insubordination

Luca Weihs; Unnat Jain; Jordi Salvador; Svetlana Lazebnik; Aniruddha Kembhavi; Alex Schwing

Bridging the Imitation Gap by Adaptive Insubordination

Luca Weihs, Unnat Jain, Jordi Salvador, Svetlana Lazebnik, Aniruddha Kembhavi, Alex Schwing

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Privileged Experts, Imitation Learning, Reinforcement Learning, Actor-Critic, Behavior Cloning, MiniGrid, Knowledge Distillation

Abstract: When expert supervision is available, practitioners often use imitation learning with varying degrees of success. We show that when an expert has access to privileged information that is unavailable to the student, this information is marginalized in the student policy during imitation learning resulting in an ''imitation gap'' and, potentially, poor results. Prior work bridges this gap via a progression from imitation learning to reinforcement learning. While often successful, gradual progression fails for tasks that require frequent switches between exploration and memorization skills. To better address these tasks and alleviate the imitation gap we propose 'Adaptive Insubordination' (ADVISOR), which dynamically weights imitation and reward-based reinforcement learning losses during training, enabling switching between imitation and exploration. On a suite of challenging didactic and MiniGrid tasks, we show that ADVISOR outperforms pure imitation, pure reinforcement learning, as well as their sequential and parallel combinations.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: Imitation learning can fail when the expert uses privileged information, we address this by combining imitation and reward-based reinforcement learning losses using dynamic weights.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/bridging-the-imitation-gap-by-adaptive/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=VdOepC8jqA

11 Replies

Loading