Keywords: imitation learning, causal inference, reinforcement learning
Abstract: We develop algorithms for imitation learning from data that was corrupted by unobserved confounders. Sources of such confounding include (a) persistent perturbations to actions or (b) the expert responding to a part of the state that the learner does not have access to. When a confounder affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch onto, leading to poor policy performance. By utilizing the effect of past states on current states, we are able to break up these spurious correlations, an application of the econometric technique of instrumental variable regression. This insight leads to two novel algorithms, one of a generative-modeling flavor ($\texttt{DoubIL}$) that can utilize access to a simulator and one of a game-theoretic flavor ($\texttt{ResiduIL}$) that can be run offline. Both approaches are able to find policies that match the result of a query to an unconfounded expert. We find both algorithms compare favorably to non-causal approaches on simulated control problems.
One-sentence Summary: We develop algorithms for imitation learning from data that was corrupted by unobserved confounders.
Supplementary Material: zip
15 Replies
Loading