Abstract: Given samples from two distributions, a nonparametric two-sample test aims at determining whether the two distributions are equal or not, based on a test statistic. This statistic may be computed on the whole dataset, or may be computed on a subset of the dataset by a function trained on its complement. We propose a third tier, consisting of functions exploiting a sequential framework to learn the differences while incrementally processing the data. Sequential processing naturally allows optional stopping, which makes our test the first truly sequential nonparametric two-sample test. We show that any sequential predictor can be turned into a sequential two-sample test for which a valid p-value can be computed, yielding controlled type I error. We also show that pointwise
universal predictors yield consistent tests, which can be built with a nonparametric regressor based on k-nearest neighbors in particular. We also show that mixtures and switch distributions can be used to increase power, while keeping consistency.
0 Replies
Loading