A Modified Logistic Regression for Positive and Unlabeled Learning

Kristen Jaskie; Charles Elkan; Andreas Spanias

A Modified Logistic Regression for Positive and Unlabeled Learning

Kristen Jaskie, Charles Elkan, Andreas Spanias

Published: 01 Jan 2019, Last Modified: 30 Sept 2024ACSSC 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The positive and unlabeled learning problem is a semi-supervised binary classification problem. In PU learning, only an unknown percentage of positive samples are known, while the remaining samples, both positive and negative, are unknown. We wish to learn a decision boundary that separates the positive and negative data distributions. In this paper, we build on an existing popular probabilistic positive unlabeled learning algorithm and introduce a new modified logistic regression learner with a variable upper bound that we argue provides a better theoretical solution for this problem. We then apply this solution to both simulated data and to a simple image classification problem using the MNIST dataset with significantly improved results.

Loading