Bridging the Imitation Gap by Adaptive Insubordination

Luca Weihs; Unnat Jain; Iou-Jen Liu; Jordi Salvador; Svetlana Lazebnik; Aniruddha Kembhavi; Alex Schwing

Bridging the Imitation Gap by Adaptive Insubordination

Luca Weihs, Unnat Jain, Iou-Jen Liu, Jordi Salvador, Svetlana Lazebnik, Aniruddha Kembhavi, Alex Schwing

Published: 09 Nov 2021, Last Modified: 26 May 2025NeurIPS 2021 PosterReaders: Everyone

Keywords: Imitation Learning, Reinforcement Learning, Actor-Critic, POMDPs, Privileged Experts, Visual Navigation

TL;DR: Imitation learning can fail when the expert uses privileged information, we address this by combining imitation and reward-based reinforcement learning losses using dynamic weights.

Abstract: In practice, imitation learning is preferred over pure reinforcement learning whenever it is possible to design a teaching agent to provide expert supervision. However, we show that when the teaching agent makes decisions with access to privileged information that is unavailable to the student, this information is marginalized during imitation learning, resulting in an "imitation gap" and, potentially, poor results. Prior work bridges this gap via a progression from imitation learning to reinforcement learning. While often successful, gradual progression fails for tasks that require frequent switches between exploration and memorization. To better address these tasks and alleviate the imitation gap we propose 'Adaptive Insubordination' (ADVISOR). ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration. On a suite of challenging tasks set within gridworlds, multi-agent particle environments, and high-fidelity 3D simulators, we show that on-the-fly switching with ADVISOR outperforms pure imitation, pure reinforcement learning, as well as their sequential and parallel combinations.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: https://github.com/allenai/advisor

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/bridging-the-imitation-gap-by-adaptive/code)

8 Replies

Loading