Keywords: Reinforcement Learning; Causal Inference; POMDP; Unobserved Confounder;
Abstract: We consider model‑based reinforcement learning (MBRL) in confounded partially observable Markov decision processes (POMDPs), where unobserved confounders in the environment will introduce bias into the learned dynamics model. Existing studies either rely on a kernel function to model the environment, or tabular settings where the observation spaces are discrete. To address these limitations, we propose a deep proximal causal MBRL (DPC-MBRL) method. Specifically, we first establish a consistent identification result for the policy value in confounded POMDPs through proximal causal inference. Based on this identification result, we then employ neural networks to model the environment dynamics, which enables a more flexible function approximation than existing studies. Through experiments on an advanced physics simulation benchmark MuJoCo and a real-world medical dataset, we demonstrate that DPC-MBRL mitigates the bias induced by unobserved confounders and yields more accurate dynamics model estimates than standard MBRL approaches.
Primary Area: reinforcement learning
Submission Number: 15457
Loading