Off-Policy Evaluation under Nonignorable Missing Data

Han Wang; Yang Xu; Wenbin Lu; Rui Song

Off-Policy Evaluation under Nonignorable Missing Data

Han Wang, Yang Xu, Wenbin Lu, Rui Song

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This paper proposes a systematic framework for handling both ignorable and nonignorable missingness in off-policy evaluation, providing comprehensive theoretical analysis on estimation consistency and asymptotic normality.

Abstract: Off-Policy Evaluation (OPE) aims to estimate the value of a target policy using offline data collected from potentially different policies. In real-world applications, however, logged data often suffers from missingness. While OPE has been extensively studied in the literature, a theoretical understanding of how missing data affects OPE results remains unclear. In this paper, we investigate OPE in the presence of monotone missingness and theoretically demonstrate that the value estimates remain unbiased under ignorable missingness but can be biased under nonignorable (informative) missingness. To retain the consistency of value estimation, we propose an inverse probability weighting value estimator and conduct statistical inference to quantify the uncertainty of the estimates. Through a series of numerical experiments, we empirically demonstrate that our proposed estimator yields a more reliable value inference under missing data.

Lay Summary: How can we evaluate the effectiveness of a treatment or recommendation when some of the most important outcomes are missing, and worse, missing for reasons we cannot observe? This is a common issue in real-world data, especially in healthcare, where patients may drop out of a study over time due to recovery or death. If not properly handled, such “informative” missingness can seriously distort evaluations of treatment effectiveness. Our method tackles this problem. We introduce a novel approach that works even when the missing data depends on unseen outcomes—a setting that no existing method handles in long-term decision problems. The key idea is to use a shadow variable, a clever proxy that holds information about the missing outcome. By incorporating this into our estimation process, we can effectively correct for the hidden bias and recover accurate evaluations. We also provide a way to quantify how confident we are in the results, ensuring reliability. This breakthrough makes it possible to evaluate policies, like a new drug treatment plan or a personalized recommendation system, with far greater accuracy, even when the data is messy or incomplete. Our work lays a foundation for safer clinical decisions and smarter AI systems in the face of real-world uncertainty.

Link To Code: https://github.com/YangXU63/OPE_MNAR_ICML25

Primary Area: Reinforcement Learning->Batch/Offline

Keywords: Off-Policy Evaluation, Missing Data, Nonignorable Missingness, Shadow Variable, Causality

Submission Number: 5721

Loading