Cross-Validated Off-Policy Evaluation

Published: 01 Jan 2024, Last Modified: 01 Aug 2025CoRR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We study estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model selection in supervised learning, off-policy evaluation relies mostly on theory, which provides only limited guidance to practitioners. We show how to use cross-validation for off-policy evaluation. This challenges a popular belief that cross-validation in off-policy evaluation is not feasible. We evaluate our method empirically and show that it addresses a variety of use cases.
Loading