## What Would the Expert $do(\cdot)$?: Causal Imitation Learning

12 Oct 2021, 19:37 (modified: 19 Nov 2021, 15:09)Deep RL Workshop NeurIPS 2021Readers: Everyone
Keywords: imitation learning, causal inference, reinforcement learning
TL;DR: We develop algorithms for imitation learning from data that was corrupted by unobserved confounders.
Abstract: We develop algorithms for imitation learning from policy data that was corrupted by unobserved confounders. Sources of such confounding include \textit{(a)} persistent perturbations to actions or \textit{(b)} the expert responding to a part of the state that the learner does not have access to. When a confounder affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch on to, leading to poor policy performance. To break up these spurious correlations, we apply modern variants of the classical \textit{instrumental variable regression} (IVR) technique, enabling us to recover the causally correct underlying policy \textit{without} requiring access to an interactive expert. In particular, we present two techniques, one of a generative-modeling flavor (\texttt{DoubIL}) that can utilize access to a simulator and one of a game-theoretic flavor (\texttt{ResiduIL}) that can be run entirely offline. We discuss, from the perspective of performance, the types of confounding under which it is better to use an IVR-based technique instead of behavioral cloning and vice versa. We find both of our algorithms compare favorably to behavioral cloning on a simulated rocket landing task.
Supplementary Material: zip
0 Replies