Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Cameron Voloshin; Hoang Minh Le; Nan Jiang; Yisong Yue

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Cameron Voloshin, Hoang Minh Le, Nan Jiang, Yisong Yue

Published: 29 Jul 2021, Last Modified: 26 May 2025NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone

Keywords: reinforcement learning, off-policy evaluation, benchmark, OPE, RL, off-policy policy evaluation, empirical study

TL;DR: We offer an experimental benchmark and empirical study for off-policy policy evaluation in reinforcement learning

Abstract: We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasing interest in deploying learning-based methods, there has been a flurry of recent proposals for OPE method, leading to a need for standardized empirical analyses. Our work takes a strong focus on diversity of experimental design to enable stress testing of OPE methods. We provide a comprehensive benchmarking suite to study the interplay of different attributes on method performance. We distill the results into a summarized set of guidelines for OPE in practice. Our software package, the Caltech OPE Benchmarking Suite (COBS), is open-sourced and we invite interested researchers to further contribute to the benchmark.

Supplementary Material: zip

URL: https://github.com/clvoloshin/COBS

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/empirical-study-of-off-policy-policy/code)

7 Replies

Loading