YEAST: Yet Another Sequential Test

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: A/B testing, experiment monitoring, sequential test, early stopping
TL;DR: We propose a new sequential test for online experiments that permits early stopping without increasing false detection risk and outperforms existing methods in statistical power, as shown in semi-synthetic simulations.
Abstract: The online evaluation of machine learning models is typically conducted through A/B experiments. Sequential statistical tests are valuable tools for analysing these experiments, as they enable researchers to stop data collection early without increasing the risk of false discoveries. However, existing sequential tests either limit the number of interim analyses or suffer from low statistical power. In this paper, we introduce a novel sequential test designed for the continuous monitoring of A/B experiments. We validate our method using semi-synthetic simulations and demonstrate that it outperforms current state-of-the-art sequential testing approaches. Our method is derived using a new technique that inverts a bound on the probability of threshold crossing, based on a classical maximal inequality.
Primary Area: Evaluation (e.g., methodology, meta studies, replicability and validity, human-in-the-loop)
Submission Number: 24228
Loading