When is Off-Policy Evaluation Useful? A Data-Centric Perspective

When is Off-Policy Evaluation Useful? A Data-Centric Perspective

ICLR 2024 Workshop DMLR Submission48 Authors

Published: 04 Mar 2024, Last Modified: 02 May 2024DMLR @ ICLR 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Data Centric AI, Off-Policy Evaluation

Abstract: Evaluating the value of a hypothetical target policy with only a logged dataset is important but challenging. On the one hand, it brings opportunities for safe policy improvement under high-stakes scenarios like clinical guidelines. On the other hand, such opportunities raise a need for precise off-policy evaluation (OPE). While previous work on OPE focused on \textbf{improving the algorithm} in value estimation, in this work, we emphasize the importance of the \textbf{offline dataset}, hence putting forward a data-centric framework for \textit{evaluating OPE problems}. We propose DataCOPE, a data-centric framework for \textbf{\textit{evaluating} OPE}, that answers the questions of whether and to what extent we can evaluate a target policy given a dataset. DataCOPE (1) forecasts the overall performance of OPE algorithms without access to the environment, which is especially useful before real-world deployment where \textit{evaluating OPE is impossible}; (2) identifies the sub-group in the dataset where OPE can be inaccurate; (3) permits evaluations of datasets or data-collection strategies for OPE problems. Our empirical analysis of DataCOPE in the logged contextual bandit settings using healthcare datasets confirms its ability to evaluate both machine-learning and human expert policies like clinical guidelines.

Primary Subject Area: Other

Paper Type: Research paper: up to 8 pages

DMLR For Good Track: Participate in DMLR for Good Track

Participation Mode: In-person

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

Submission Number: 48

Loading