What Would the Expert $do(\cdot)$?: Causal Imitation Learning

Gokul Swamy; Sanjiban Choudhury; Drew Bagnell; Steven Wu

What Would the Expert $do(\cdot)$?: Causal Imitation Learning

Gokul Swamy, Sanjiban Choudhury, Drew Bagnell, Steven Wu

12 Oct 2021 (modified: 05 May 2023)Deep RL Workshop NeurIPS 2021Readers: Everyone

Keywords: imitation learning, causal inference, reinforcement learning

TL;DR: We develop algorithms for imitation learning from data that was corrupted by unobserved confounders.

Abstract: We develop algorithms for imitation learning from policy data that was corrupted by unobserved confounders. Sources of such confounding include \textit{(a)} persistent perturbations to actions or \textit{(b)} the expert responding to a part of the state that the learner does not have access to. When a confounder affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch on to, leading to poor policy performance. To break up these spurious correlations, we apply modern variants of the classical \textit{instrumental variable regression} (IVR) technique, enabling us to recover the causally correct underlying policy \textit{without} requiring access to an interactive expert. In particular, we present two techniques, one of a generative-modeling flavor (\texttt{DoubIL}) that can utilize access to a simulator and one of a game-theoretic flavor (\texttt{ResiduIL}) that can be run entirely offline. We discuss, from the perspective of performance, the types of confounding under which it is better to use an IVR-based technique instead of behavioral cloning and vice versa. We find both of our algorithms compare favorably to behavioral cloning on a simulated rocket landing task.

Supplementary Material: zip

0 Replies

Loading