Abstract: Self-supervised pretraining on unlabeled data followed by supervised finetuning
on labeled data is a popular paradigm for learning from limited labeled examples.
In this paper, we investigate and extend this paradigm to the classical positive unlabeled (PU) setting - the weakly supervised task of learning a binary classifier only
using a few labeled positive examples and a set of unlabeled samples. We propose
a novel PU learning objective positive unlabeled Noise Contrastive Estimation
(puNCE) that leverages the available explicit (from labeled samples) and implicit
(from unlabeled samples) supervision to learn useful representations from positive
unlabeled input data. The underlying idea is to assign each training sample an
individual weight; labeled positives are given unit weight; unlabeled samples are
duplicated, one copy is labeled positive and the other as negative with weights π
and (1 − π) where π denotes the class prior. Extensive experiments across vision
and natural language tasks reveal that puNCE consistently improves over existing
unsupervised and supervised contrastive baselines under limited supervision.
0 Replies
Loading