Keywords: Positive-Unlabeled learning, robustness
TL;DR: Theoretical analysis of robustness of prior-based Positive-Unlabeled(PU) risk estimators and training sample efficiency; and a novel PU risk estimator based on class posterior probabilities corresponding to multiple evidences.
Abstract: Learning from Positive and Unlabeled (PU) data presents unique challenges in scenarios where negative examples are absent. Many state-of-the-art PU methods are prior-based which assumes that the class probability within the unlabeled data corresponds to the class prior probability. However, this framework often falls short when attempting to accurately represent the complexities of real-world applications, such as industrial anomaly detection, where variations in data distribution within the combined training set are prevalent. In this paper, we introduce a generalized PU framework that models uncertainty via subset-specific posterior probabilities, proposing a posterior-based method (postPU) with theoretically and empirically validated consistency. Further, we establish that sample weighting is fundamental to PU robustness and derive a class-balanced weighting principle to minimize sensitivity to label inaccuracies. Experiments show the effectiveness and robustness of postPU and its capacity to leverage auxiliary uncertain annotations.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 954
Loading