Universal Off-Policy Evaluation

Yash Chandak; Scott Niekum; Bruno Castro da Silva; Erik Learned-Miller; Emma Brunskill; Philip S. Thomas

Universal Off-Policy Evaluation

Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

Published: 09 Nov 2021, Last Modified: 26 May 2025NeurIPS 2021 PosterReaders: Everyone

Keywords: metrics, distribution, CDF, off-policy evaluation, OPE, reinforcement learning, counterfactuals, high-confidence bounds, confidence intervals

TL;DR: We develop an off-policy method to estimate and provide high-confidence bounds for any parameter (e.g., mean, variance, CVaR, quantile, etc.) of the return disitrbution, in a variety of settings.

Abstract: When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the _expected_ value of a performance measure called the return. In this paper, we take the first steps towards a 'universal off-policy estimator' (UnO)---one that provides off-policy estimates and high-confidence bounds for _any_ parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss UnO's applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: https://github.com/yashchandak/UnO

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/universal-off-policy-evaluation/code)

11 Replies

Loading