Using Confounded Data in Offline RL

Maxime Gasse; Damien GRASSET; Guillaume Gaudron; Pierre-Yves Oudeyer

Using Confounded Data in Offline RL

Maxime Gasse, Damien GRASSET, Guillaume Gaudron, Pierre-Yves Oudeyer

05 Oct 2022 (modified: 05 May 2023)Offline RL Workshop NeurIPS 2022Readers: Everyone

Keywords: causality, offline rl, confounding

TL;DR: We show how confounded offline data can be used as a regularizer in RL

Abstract: In this work we consider the problem of confounding in offline RL, also called the delusion problem. While it is known that learning from purely offline data is a hazardous endeavor in the presence of confounding, in this paper we show that offline, confounded data can be safely combined with online, non-confounded data to improve the sample-efficiency of model-based RL. We import ideas from the well-established framework of $do$-calculus to express model-based RL as a causal inference problem, thus bridging the fields of RL and causality. We propose a latent-based method which we prove is correct and efficient, in the sense that it attains better generalization guarantees thanks to the offline, confounded data (in the asymptotic case), regardless of the expert's behavior. We illustrate the effectiveness of our method on a series of synthetic experiments.

2 Replies

Loading