Confounding-Robust Fitted-Q-Iteration under Observed Markovian Marginals

Published: 28 Nov 2025, Last Modified: 30 Nov 2025NeurIPS 2025 Workshop MLxOREveryoneRevisionsBibTeXCC BY 4.0
Keywords: causal reinforcement learning, offline reinforcement learning
Abstract: Sequential decision-making problems in medicine, economics, and e-commerce require the use of historical observational data when online experimentation is costly, dangerous or unethical. Given the rise of big data, these observational datasets are increasingly large and widely available, with great potential to improve decisions based on personalizing treatments to those who most benefit. The recent literature on offline reinforcement learning (RL) has made extensive progress on evaluating and optimizing sequential decision rules given only historical datasets of observed trajectories. In particular, methods that target estimation of the Q function leveraging black-box regression, such as fitted-Q-evaluation and fitted-Q-iteration (FQE/FQI), have gained popularity due to their computational ease and scalability \citep{voloshin2019empirical}.
Submission Number: 150
Loading