Unmasking and Exploiting Hidden Strata for Robust and Inclusive Positive Unlabeled Learning

Sayantan Saha; Jiaul H. Paik

Unmasking and Exploiting Hidden Strata for Robust and Inclusive Positive Unlabeled Learning

Sayantan Saha, Jiaul H. Paik

20 Sept 2025 (modified: 18 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: PU learning, hidden stratitification, binary classification

Abstract: Positive–Unlabeled (PU) learning aims to train a binary classifier using only labeled positive data and a large set of unlabeled samples. Although effective, the state-of-the-art PU learning methods focus on coarse-grained separation between positive and negative classes. In real-world datasets, however, $\textit{hidden stratification}$ frequently occurs, where the positive class comprises multiple fine-grained subclasses with varying prevalence. Ignoring these latent subclasses biases PU classifiers toward dominant subclasses of the positive class, leading to systematic misclassification of rare subclasses. To address this challenge, we propose a subclass-aware PU learning method that first discovers the hidden subclasses through a fully automatic and adaptive graph-based approach. It then leverages the hidden subclasses to select the potential negative examples from the unlabeled set. Comprehensive experimental results demonstrate that the method consistently outperforms the existing PU learning methods on a range of datasets under various distributional settings of the subclasses. A noteworthy property of the proposed method is that it does not require any input about the number of hidden subclasses, thereby making it remarkably robust. To the best of our knowledge, our approach is the first which addresses the hidden subclass issue in PU learning.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 24294

Loading