TL;DR: This paper proposes a novel label distribution propagation-based label completion (LDPLC) algorithm.
Abstract: In real-world crowdsourcing scenarios, most workers often annotate a few instances only, which results in a significantly sparse crowdsourced label matrix and subsequently harms the performance of label integration algorithms. Recent work called worker similarity-based label completion (WSLC) has been proven to be an effective algorithm to addressing this issue. However, WSLC considers solely the correlation of the labels annotated by different workers on per individual instance while totally ignoring the correlation of the labels annotated by different workers among similar instances. To fill this gap, we propose a novel label distribution propagation-based label completion (LDPLC) algorithm. At first, we use worker similarity weighted majority voting to initialize a label distribution for each missing label. Then, we design a label distribution propagation algorithm to enable each missing label of each instance to iteratively absorb its neighbors’ label distributions. Finally, we complete each missing label based on its converged label distribution. Experimental results on both real-world and simulated crowdsourced datasets show that LDPLC significantly outperforms WSLC in enhancing the performance of label integration algorithms. Our codes and datasets are available at https://github.com/jiangliangxiao/LDPLC.
Lay Summary: Crowdsourcing is an efficient and cost-effective method to rapidly obtain large volumes of annotated data. In real-world Crowdsourcing scenarios, most workers only annotate a small number of tasks, which leaves large missing labels in the data. This subsequently harms the performance of label integration algorithms (infer the true labels of tasks). A recent algorithm called worker similarity-based label completion (WSLC) completes missing labels by observing how different workers annotate the same tasks. However, we found that relying solely on this single perspective is insufficient to fully utilize the information in data. To address this, the proposed label Distribution propagation-based label completion (LDPLC) introduces a new perspective by observing how the same worker annotate different tasks. The new perspective allows each missing label to not only absorb the information from similar workers on the corresponding task but also absorb the information from the corresponding worker across similar tasks. Our finding have implications for the preprocessing of data, which can enhance the performance of label integration algorithms.
Link To Code: https://github.com/jiangliangxiao/LDPLC
Primary Area: General Machine Learning->Supervised Learning
Keywords: Crowdsourcing learning, Label completion, Worker similarity, Label distribution propagation
Submission Number: 3327
Loading