Consistent Multi-Class Classification from Multiple Unlabeled Datasets

Zixi Wei; Senlin Shu; Yuzhou Cao; Hongxin Wei; Bo An; Lei Feng

Consistent Multi-Class Classification from Multiple Unlabeled Datasets

Zixi Wei, Senlin Shu, Yuzhou Cao, Hongxin Wei, Bo An, Lei Feng

Published: 16 Jan 2024, Last Modified: 05 Mar 2024ICLR 2024 spotlightEveryoneRevisionsBibTeX

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: mutli-class classification, multiple unlabeled datasets, learning consistency

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We conduct a comprehensive study for multi-class classification from multiple unlabeled datasets by proposing consistent learning methods.

Abstract: Weakly supervised learning aims to construct effective predictive models from imperfectly labeled data. The recent trend of weakly supervised learning has focused on how to learn an accurate classifier from completely unlabeled data, given little supervised information such as class priors. In this paper, we consider a newly proposed weakly supervised learning problem called multi-class classification from multiple unlabeled datasets, where only multiple sets of unlabeled data and their class priors (i.e., the proportions of each class) are provided for training the classifier. To solve this problem, we first propose a classifier-consistent method (CCM) based on a probability transition matrix. However, CCM cannot guarantee risk consistency and lacks of purified supervision information during training. Therefore, we further propose a risk-consistent method (RCM) that progressively purifies supervision information during training by importance weighting. We provide comprehensive theoretical analyses for our methods to demonstrate the statistical consistency. Experimental results on multiple benchmark datasets and various prior matrices demonstrate the superiority of our proposed methods.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 4492

Loading