Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy EvaluationDownload PDF

03 Jun 2021 (modified: 24 May 2023)Submitted to NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone
Keywords: off-policy evaluation, real-world dataset, open-source software, benchmark experiments, offline contextual bandits
TL;DR: Large-scale public real dataset and open-source software to enable realistic and reproducible experiments and implementations of off-policy evaluation
Abstract: Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using data generated by a different policy. Because of its huge potential impact, there has been growing research interest in OPE. There is, however, no real-world public dataset that enables the evaluation of OPE, making its experimental studies unrealistic and irreproducible. With the goal of enabling realistic and reproducible OPE research, we publicize Open Bandit Dataset collected on a large-scale fashion e-commerce platform, ZOZOTOWN. Our dataset is unique in that it contains a set of multiple logged bandit feedback datasets collected by running different policies on the same platform. This enables realistic and reproducible experimental comparisons of different OPE estimators for the first time. We also develop Python software called Open Bandit Pipeline to streamline and standardize the implementation of batch bandit algorithms and OPE. Our open data and pipeline will contribute to the fair and transparent OPE research and help the community identify fruitful research directions. Finally, we provide extensive benchmark experiments of existing OPE estimators using our data and pipeline. The results open up essential challenges and new avenues for future OPE research.
Supplementary Material: zip
URL: Public Real-World Dataset: https://research.zozo.com/data.html / Open-Source Software (Open Bandit Pipeline): https://github.com/st-tech/zr-obp
9 Replies

Loading