Probabilistic Matrix Factorization-based Three-stage Label Completion for Crowdsourcing

Published: 2024, Last Modified: 17 May 2025ICDM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Crowdsourcing provides a cost-effective solution to the problem of obtaining large annotated datasets. In real-world crowdsourcing scenarios, most workers often annotate a few instances only, which results in a significantly sparse crowdsourcing label matrix and subsequently harms the performance of label integration algorithms. Probabilistic matrix factorization (PMF) has been proven to be an effective method for crowdsourcing label completion. However, its low-quality input and output labels limit its performance. To improve its performance, this paper proposes a PMF-based three-stage label completion (PMF-TLC) method. In the first stage, we design a label confidence-based strategy to estimate the quality of each raw label of each worker. Then we flip those low-quality labels in the original crowdsourcing label matrix. In the second stage, we conduct PMF on the flipped label matrix and obtain the completed label matrix with soft labels. In the third stage, we design a between-class margin-based filter to delete those low-quality soft labels in the completed label matrix. Then we convert the remaining high-quality soft labels to hard (logic) labels and obtain the final processed label matrix. Extensive experimental results on real-world and simulated crowdsourced datasets show that PMF-TLC can significantly improve label integration algorithms' performance.
Loading