TL;DR: This paper proposes a novel transfer learning-based label completion (TLLC) algorithm.
Abstract: Label completion serves as a preprocessing approach to handling the sparse crowdsourced label matrix problem, significantly boosting the effectiveness of the downstream label aggregation. In recent advances, worker modeling has been proved to be a powerful strategy to further improve the performance of label completion. However, in real-world scenarios, workers typically annotate only a few instances, leading to insufficient worker modeling and thus limiting the improvement of label completion. To address this issue, we propose a novel transfer learning-based label completion (TLLC) method. Specifically, we first identify all high-confidence instances from the whole crowdsourced data as a source domain and use it to pretrain a Siamese network. The abundant annotated instances in the source domain provide essential knowledge for worker modeling. Then, we transfer the pretrained network to the target domain with the instances annotated by each worker separately, ensuring worker modeling captures unique characteristics of each worker. Finally, we leverage the new embeddings learned by the transferred network to complete each worker’s missing labels. Extensive experiments on several widely used real-world datasets demonstrate the effectiveness of TLLC. Our codes and datasets are available at https://github.com/jiangliangxiao/TLLC.
Lay Summary: In machine learning, obtaining high-quality annotated data is expensive and time-consuming. Crowdsourcing offers a cost-effective alternative, but due to the lack of expertise among crowd workers and their partial labeling of the data, the crowdsourced data is often noisy and sparse, limiting the performance of downstream tasks. To address this, label completion is proposed to fill in the missing labels in crowdsourced data. Ideally, learning workers’ annotation patterns through worker modeling is valuable for label completion. However, limited annotations per worker hinder effective modeling. The proposed Transfer Learning-based Label Completion (TLLC) method overcomes this by leveraging transfer learning. First, it pretrains a Siamese network on high-confidence data (source domain) to learn general annotation patterns, which avoids modeling workers from scratch. Then, TLLC transfers and fine-tunes the pretrained network on individual workers’ data (target domain) to capture their unique characteristics, which evolves worker modeling from general annotation patterns to individual worker-specific patterns. Finally, these transferred networks are used to predict missing labels. Through TLLC, worker modeling becomes more effective, thereby improving the performance of label completion.
Link To Code: https://github.com/jiangliangxiao/TLLC
Primary Area: General Machine Learning->Supervised Learning
Keywords: Crowdsourcing learning, Label Completion, Worker modeling, Transfer Learning
Submission Number: 3322
Loading