Improving State-of-the-Art in One-Class Classification by Leveraging Unlabeled DataDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Abstract: Recent advances in One-Class (OC) classification combine the ability to learn exclusively from positive examples with the expressive power of deep neural networks. A cornerstone of OC methods is to make assumptions regarding negative distribution, e.g., that negative data are scattered uniformly or concentrated in the origin. An alternative approach employed in Positive-Unlabeled (PU) learning is to additionally leverage unlabeled data to approximate negative distribution more precisely. In this paper, our goal is to find the best ways to utilize unlabeled data on top of positive data in different settings. While it is reasonable to expect that PU algorithms outperform OC algorithms due to access to more data, we find that the opposite can be true if unlabeled data is unreliable, i.e. contain negative examples that are either too few or sampled from a different distribution. As an alternative to using existing PU algorithms, we propose to modify OC algorithms to incorporate unlabeled data. We find that such PU modifications can consistently benefit even from unreliable unlabeled data if they satisfy a crucial property: when unlabeled data consists exclusively of positive examples, the PU modification becomes equivalent to the original OC algorithm. Our main practical recommendation is to use state-of-the-art PU algorithms when unlabeled data is reliable and to use PU modifications of state-of-the-art OC algorithms that satisfy the formulated property otherwise. Additionally, we make a progress towards distinguishing the cases of reliable and unreliable unlabeled data using statistical tests.
Supplementary Material: zip
5 Replies

Loading