Keywords: PU learning, causal inference, semi-supervised learning
Abstract: Positive-Unlabeled (PU) learning aims to achieve high-accuracy binary classification with
limited labeled positive examples and numerous unlabeled ones. Existing cost-sensitive-based
methods often rely on strong assumptions that examples with an observed positive label were
selected entirely at random. In fact, the uneven distribution of labels is prevalent in
real-world PU problems, indicating that most actual positive and unlabeled data are subject
to selection bias. In this paper, we propose a PU learning enhancement (PUe) algorithm
based on causal inference theory, which employs normalized propensity scores and normalized
inverse probability weighting (NIPW) techniques to reconstruct the loss function, thus
obtaining a consistent, unbiased estimate of the classifier and enhancing the model's
performance. Moreover, we investigate and propose a method for estimating propensity scores
in deep learning using regularization techniques when the labeling mechanism is unknown.
Our experiments on three benchmark datasets demonstrate the proposed PUe algorithm significantly
improves the accuracy of classifiers on non-uniform label distribution datasets compared to
advanced cost-sensitive PU methods. Codes are available at https://github.com/huawei-noah/Noah-research/tree/master/PUe and https://gitee.com/mindspore/models/tree/master/research/cv/PUe.
Supplementary Material: pdf
Submission Number: 4318
Loading