Abstract: Many online system fraud detection techniques employ deep learning models for identifying malicious user activity sessions. In many real-world scenarios, few labeled malicious and many unlabeled sessions exist. In such scenarios, the fraud detection problem can be effectively addressed through the Positive Unlabeled (PU) learning technique. Despite this fact, possible malicious sessions can be extremely diverse, which makes learning a good decision boundary challenging. In this paper, we present a novel contrastive positive unlabeled learning (ConPU) model for fraud detection and in particular, propose a contrastive loss function for PU learning. ConPU indirectly approximates the cluster center of normal sessions in the representation space by using distributions of unlabeled sessions and malicious sessions, predicts labels of sessions in the unlabeled set by analyzing their proximity between normal and malicious session cluster centers in the representation space, and then incorporates both positive pairs and negative pairs into the contrastive loss function. As a result, ConPU can derive separable representations as well as accurate cluster centers of normal and malicious sessions in the representation space. We theoretically demonstrate the efficacy o f o ur d eveloped l oss f unction i n C onPU. Additionally, we empirically evaluate ConPU on benchmark datasets, in which, ConPU demonstrates substantial performance improvement over state-of-the-art baselines.
0 Replies
Loading