Keywords: Anytime Valid, Sequential Testing, Experimentation, Bayesian Methods, Martingales, A/B Testing, Confidence Sequences, e-processes
TL;DR: We develop "always-valid" sequential statistical procedures for several important applications in the experimentation space dealing with count data. Illustrated with real A/B test data from Netflix.
Abstract: Many experiments compare count outcomes among treatment groups. Examples include the number of successful signups in conversion rate experiments or the number of errors produced by software versions in canary tests. Observations typically arrive in a sequence and practitioners wish to continuously monitor their experiments, sequentially testing hypotheses while maintaining Type I error probabilities under optional stopping and continuation. These goals are frequently complicated in practice by non-stationary time dynamics. We provide practical solutions through sequential tests of multinomial hypotheses, hypotheses about many inhomogeneous Bernoulli processes and hypotheses about many time-inhomogeneous Poisson counting processes. For estimation, we further provide confidence sequences for multinomial probability vectors, all contrasts among probabilities of inhomogeneous Bernoulli processes and all contrasts among intensities of time-inhomogeneous Poisson counting processes. Together, these provide an ``anytime-valid'' inference framework for a wide variety of experiments dealing with count outcomes, which we illustrate with several industry applications.
Supplementary Material: pdf