Trust the model when it is confounded: Model-based Reinforcement learning for Confounded POMDPs

Haolin Wang; Lin Liu; Jiuyong Li; Zehong Cao; Ziqi Xu; Jixue Liu

Trust the model when it is confounded: Model-based Reinforcement learning for Confounded POMDPs

Haolin Wang, Lin Liu, Jiuyong Li, Zehong Cao, Ziqi Xu, Jixue Liu

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning; Causal Inference; POMDP; Unobserved Confounder;

Abstract: We consider model‑based reinforcement learning (MBRL) in confounded partially observable Markov decision processes (POMDPs), where unobserved confounders in the environment will introduce bias into the learned dynamics model. Existing studies either rely on a kernel function to model the environment, or tabular settings where the observation spaces are discrete. To address these limitations, we propose a deep proximal causal MBRL (DPC-MBRL) method. Specifically, we first establish a consistent identification result for the policy value in confounded POMDPs through proximal causal inference. Based on this identification result, we then employ neural networks to model the environment dynamics, which enables a more flexible function approximation than existing studies. Through experiments on an advanced physics simulation benchmark MuJoCo and a real-world medical dataset, we demonstrate that DPC-MBRL mitigates the bias induced by unobserved confounders and yields more accurate dynamics model estimates than standard MBRL approaches.

Primary Area: reinforcement learning

Submission Number: 15457

Loading