Active Learning with Missing Not At Random Outcomes

Published: 27 Oct 2023, Last Modified: 22 Dec 2023RealML-2023EveryoneRevisionsBibTeX
Keywords: Missing not at random, pool-based, active learning
TL;DR: We design an active learning procedure for settings where outcomes in the training data are missing not at random.
Abstract: When outcomes in training data are missing not at random (MNAR), predictors that are trained on that data can be arbitrarily biased. In some cases, however, batches of missing outcomes can be recovered at some cost, giving rise to a pool-based active learning setting. Previous active learning approaches implicitly treat all labeled data as having come from the same distribution, whereas in the MNAR setting, the training data and the initial unlabeled pool have different distributions. We propose MNAR-Aware Active Learning (MAAL), an active learning procedure that takes this into account and takes advantage of information that the missingness indicator carries about the outcome. We additionally consider acquisition functions that are attuned to the MNAR setting. Experiments on a large set of classification benchmark datasets demonstrate the benefits of our proposed approach over standard active and passive learning approaches.
Submission Number: 46