Complementary Sum Sampling for Likelihood Approximation in Large Scale Classification

Aleksandar Botev, Bowen Zheng, David Barber

Published: 2017, Last Modified: 30 Sept 2024AISTATS 2017EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We consider training probabilistic classifiers in the case that the number of classes is too large to perform exact normalisation over all classes. We show that the source of high variance in standard sampling approximations is due to simply not including the correct class of the datapoint into the approximation. To account for this we explicitly sum over a subset of classes and sample the remaining. We show that this simple approach is competitive with recently introduced non likelihood-based approximations.