Unbiased Multi-Label Learning from Crowdsourced Annotations

Published: 02 May 2024, Last Modified: 25 Jun 2024ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This work studies the novel Crowdsourced Multi-Label Learning (CMLL) problem, where each instance is related to multiple true labels but the model only receives unreliable labels from different annotators. Although a few Crowdsourced Multi-Label Inference (CMLI) methods have been developed, they require both the training and testing sets to be assigned crowdsourced labels and focus on true label inferring rather than prediction, making them less practical. In this paper, by excavating the generation process of crowdsourced labels, we establish the first **unbiased risk estimator** for CMLL based on the crowdsourced transition matrices. To facilitate transition matrix estimation, we upgrade our unbiased risk estimator by aggregating crowdsourced labels and transition matrices from all annotators while guaranteeing its theoretical characteristics. Integrating with the unbiased risk estimator, we further propose a decoupled autoencoder framework to exploit label correlations and boost performance. We also provide a generalization error bound to ensure the convergence of the empirical risk estimator. Experiments on various CMLL scenarios demonstrate the effectiveness of our proposed method. The source code is available at https://github.com/MingxuanXia/CLEAR.
Submission Number: 2678
Loading