KD-Crowd: A Knowledge Distillation Framework for Learning from Crowds
Abstract: Recently, crowdsourcing has established
itself as an efficient labeling solution by distributing tasks to crowd workers. As the workers can
make mistakes with diverse expertise, one core learning task is to estimate each worker’s expertise, and
aggregate over them to infer the latent true labels.
In this paper, we show that as one of the major research directions, the noise transition matrix based
worker expertise modeling methods commonly overfit the annotation noise, either due to the oversimplified noise assumption or inaccurate estimation.
To solve this problem, we propose a knowledge
distillation framework (KD-Crowd) by combining
the complementary strength of noise-model-free robust learning techniques and transition matrix based
worker expertise modeling. The framework consists of two stages: in stage 1, a noise-model-free
robust student model is trained by treating the prediction of a transition matrix based crowdsourcing
teacher model as noisy labels, aiming at correcting
the teacher’s mistakes and obtaining better true label predictions; in stage 2, we switch their roles,retraining a better crowdsourcing model using the
crowds’ annotations supervised by the refined true
label predictions given by stage 1. Additionally,
we propose one f-mutual information gain (MIGf
)
based knowledge distillation loss, which finds the
maximum information intersection between the student’s and teacher’s prediction. We show in experiments that MIGf
achieves obvious improvements
compared to the regular KL divergence knowledge
distillation loss, which tends to force the student to
memorize all information of the teacher’s prediction, including its errors. We conduct extensive experiments showing that, as a universal framework KD-Crowd substantially improves previous crowdsourcing methods on true label prediction and worker
expertise estimation.
Loading