Positive Unlabeled Learning with a Sequential Selection Bias

Walter Gerych, Thomas Hartvigsen, Luke Buquicchio, Abdulaziz Alajaji, Kavin Chandrasekaran, Hamid Mansoor, Elke A. Rundensteiner, Emmanuel Agu

2022 (modified: 15 Dec 2022)SDM 2022Readers: Everyone

Abstract: In important domains from video stream analytics to human context recognition, datasets are only partially-labeled. Worse yet, the labels are often applied sequentially, as annotators choose labels frame-by-frame or timestep-by-timestep in sequence. With labels not collected independently, this results in sequential bias in the labeling. Unfortunately, current state-of-the-art methods for partially labeled data are rendered ineffective under sequential bias. In this work, we propose a novel solution to tackling this open sequential bias problem, called DeepSPU. DeepSPU recovers missing labels by constructing a model of the sequentially biased labeling process itself. This labeling model is then learned jointly with the prediction model that infers the missing labels in an iterative training process. Further, we regulate this training using a theoretically-justified cost functions that prevent our model from converging to incorrect but low-cost solution. Our experimental studies demonstrate that DeepSPU consistently outperforms the state-of-the-art methods by over 10% on a rich variety of real-world datasets.

0 Replies