KD-Crowd: A Knowledge Distillation Framework for Learning from Crowds

Shao-Yuan Li, Yu-Xiang Zheng, Ye Shi, Sheng-Jun Huang, Songcan Chen

Published: 01 Nov 2023, Last Modified: 04 Feb 2024Frontiers of Computer ScienceEveryoneCC BY 4.0

Abstract: Recently, crowdsourcing has established itself as an efficient labeling solution by distributing tasks to crowd workers. As the workers can make mistakes with diverse expertise, one core learning task is to estimate each worker’s expertise, and aggregate over them to infer the latent true labels. In this paper, we show that as one of the major research directions, the noise transition matrix based worker expertise modeling methods commonly overfit the annotation noise, either due to the oversimplified noise assumption or inaccurate estimation. To solve this problem, we propose a knowledge distillation framework (KD-Crowd) by combining the complementary strength of noise-model-free robust learning techniques and transition matrix based worker expertise modeling. The framework consists of two stages: in stage 1, a noise-model-free robust student model is trained by treating the prediction of a transition matrix based crowdsourcing teacher model as noisy labels, aiming at correcting the teacher’s mistakes and obtaining better true label predictions; in stage 2, we switch their roles,retraining a better crowdsourcing model using the crowds’ annotations supervised by the refined true label predictions given by stage 1. Additionally, we propose one f-mutual information gain (MIGf ) based knowledge distillation loss, which finds the maximum information intersection between the student’s and teacher’s prediction. We show in experiments that MIGf achieves obvious improvements compared to the regular KL divergence knowledge distillation loss, which tends to force the student to memorize all information of the teacher’s prediction, including its errors. We conduct extensive experiments showing that, as a universal framework KD-Crowd substantially improves previous crowdsourcing methods on true label prediction and worker expertise estimation.