Datasets for Online Controlled ExperimentsDownload PDF

Published: 11 Oct 2021, Last Modified: 23 May 2023NeurIPS 2021 Datasets and Benchmarks Track (Round 2)Readers: Everyone
Keywords: Online controlled experiments, A/B test, Dataset survey, Dataset taxonomy, Statistical tests
TL;DR: The first survey and taxonomy for online controlled experiment datasets, together with the first public dataset that supports the design and running of experiments with adaptive stopping.
Abstract: Online Controlled Experiments (OCE) are the gold standard to measure impact and guide decisions for digital products and services. Despite many methodological advances in this area, the scarcity of public datasets and the lack of a systematic review and categorization hinder its development. We present the first survey and taxonomy for OCE datasets, which highlight the lack of a public dataset to support the design and running of experiments with adaptive stopping, an increasingly popular approach to enable quickly deploying improvements or rolling back degrading changes. We release the first such dataset, containing daily checkpoints of decision metrics from multiple, real experiments run on a global e-commerce platform. The dataset design is guided by a broader discussion on data requirements for common statistical tests used in digital experimentation. We demonstrate how to use the dataset in the adaptive stopping scenario using sequential and Bayesian hypothesis tests and learn the relevant parameters for each approach.
Supplementary Material: pdf
URL: The dataset, its schema and accompanying datasheet (with the intended uses, plus the hosting, licensing, and maintenance plan), and the experiment code are available on Open Science Framework: (DOI 10.17605/OSF.IO/64JSB). The project (and dataset) is open-sourced under a CC-By Attribution 4.0 International license.
Contribution Process Agreement: Yes
Dataset Url: The dataset, its schema and accompanying datasheet (with the intended uses, plus the hosting, licensing, and maintenance plan), and the experiment code are available on Open Science Framework:
Dataset Embargo: N/A
License: The project (including the dataset) is open-sourced under a CC-By Attribution 4.0 International license.
Author Statement: Yes
13 Replies
