Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data
Abstract: Most positive and unlabeled data is subject to selection
biases. The labeled examples can, for example, be selected from the
positive set because they are easier to obtain or more obviously positive.
This paper investigates how learning can be enabled in this setting. We
propose and theoretically analyze an empirical-risk-based method for
incorporating the labeling mechanism. Additionally, we investigate under
which assumptions learning is possible when the labeling mechanism is
not fully understood and propose a practical method to enable this. Our
empirical analysis supports the theoretical results and shows that taking
into account the possibility of a selection bias, even when the labeling
mechanism is unknown, improves the trained classifiers.
0 Replies
Loading