Keywords: positive-unlabeled learning, semi-supervised learning, pseudo-labeling, deep ensembles, uncertainty quantification
Abstract: Positive-unlabeled (PU) learning aims at learning a binary classifier from only positive and unlabeled training data. Recent approaches address this problem via cost-sensitive learning by developing unbiased loss functions or via iterative pseudo-labeling solutions to further improve performance. However, two-steps procedures are vulnerable to incorrectly estimated pseudo-labels, as errors are propagated in later iterations when a new model is trained on erroneous predictions. To mitigate this issue we propose \textit{PUUPL}, a new loss-agnostic training procedure for PU learning that incorporates epistemic uncertainty in pseudo-labeling. Using an ensemble of neural networks and assigning pseudo-labels based on high confidence predictions improves the reliability of pseudo-labels, increasing the predictive performance of our method and leads to new state-of-the-art results in PU learning. With extensive experiments, we show the effectiveness of our method over different datasets, modalities, and learning tasks, as well as improved robustness over mispecifications of hyper-parameters and biased positive data. The source code of the method and all the experiments are available in the supplementary material.
One-sentence Summary: Selecting pseudo-labels based on epistemic uncertainty provides better performance and increased robustness in positive-unlabeled learning.
Supplementary Material: zip
15 Replies
Loading