Keywords: Causal Machine Learning, Doubly Robust Estimation, Neyman-Orthogonality, Markov Decision Process
Abstract: Predicting individualized potential outcomes in sequential decision-making is central
for optimizing therapeutic decisions in personalized medicine (e.g., which
dosing sequence to give to a cancer patient). However, predicting potential out-
comes over long horizons is notoriously difficult. Existing methods that break the
curse of the horizon typically lack strong theoretical guarantees such as orthogonality
and quasi-oracle efficiency. In this paper, we revisit the problem of predicting
individualized potential outcomes in sequential decision-making (i.e., estimating
Q-functions in Markov decision processes with observational data) through a
causal inference lens. In particular, we develop a comprehensive theoretical foundation
for meta-learners in this setting with a focus on beneficial theoretical properties.
As a result, we yield a novel meta-learner called DRQ-learner and establish
that it is: (1) doubly robust (i.e., valid inference under model misspecification),
(2) Neyman-orthogonal (i.e., insensitive to first-order estimation errors in the nuisance
functions), and (3) achieves quasi-oracle efficiency (i.e., behaves asymptotically
as if the ground-truth nuisance functions were known). Our DRQ-learner is
applicable to settings with both discrete and continuous state spaces. Further, our
DRQ-learner is flexible and can be used together with arbitrary machine learning
models (e.g., neural networks). We validate our theoretical results through
numerical experiments, thereby showing that our meta-learner outperforms state-of-the-art baselines.
Primary Area: causal reasoning
Submission Number: 7404
Loading