Abstract: Positive and Unlabeled Learning (PUL) is a special semi-supervised learning paradigm trained on datasets comprising positive and unlabeled samples, and it is widely applied in many real-world applications. Due to the lack of reliable labeled negative samples, PUL is much more challenging than traditional semi-supervised learning. Previous works primarily focus on two fronts: applying varying loss weights for different samples or leveraging algorithms to construct auxiliary datasets with reliable labels. Despite the impressive performance the previous works have already achieved, they are still faced with the challenge of negative classification bias, while struggling to get rid of the need for class prior knowledge. In this paper, we have transformed the PUL task into a constrained optimization problem, and propose a new PUL framework to solve it, namely ALM-PU. More detailed, our ALM-PU aims to minimize the classifier’s classification loss on reliable labeled set as the ultimate optimization goal, while subject to the constraint of reducing the cost on the unlabeled set to a certain extent. Subsequently, ALM-PU integrates the primary objective with the constraints to construct the PUL-constrained optimization framework, and implement it in the neural network structure. During the training process, our approach corrects the model’s negative classification bias, achieving superior classification performance compared to previous methods. Additionally, a prediction sequence-based algorithm is utilized to aid the classifier in better distinguishing positive from negative samples with training results. We conducted extensive experiments on multiple PUL benchmarks. ALM-PU achieves an average improvement of 2% in key metrics, attaining state-of-the-art performance. These findings validate the effectiveness of our ALM-PU approach. Complete code and more experimental details can be found at ALM-PU.
External IDs:dblp:journals/ml/WeiWSLD25
Loading