Keywords: Crowdsourcing, label aggregation, Confidence Calibration, Label Distribution Learning
TL;DR: Label Distribution Learning-based Confidence Calibration for Crowdsourcing
Abstract: Crowdsourcing typically collects multiple noisy labels for each instance and then aggregates these labels to infer its unknown true label. We discover that miscalibration, an important issue in supervised learning, also frequently arises in label aggregation. Miscalibration prevents existing label aggregation methods from assigning accurate confidence when inferring aggregated labels. However, in downstream tasks of label aggregation, both the aggregated labels and their associated confidence are equally significant. To address this issue, we formally define confidence calibration for crowdsourcing and propose a novel Label Distribution Learning-based Confidence Calibration (LDLCC) method in this paper. Specifically, to mitigate the impact of noisy labels, we first identify high-confidence instances and sharpen their label distributions based on the results of label aggregation. Subsequently, to avoid the overconfidence caused by the translation invariance of softmax, we train a regression network to learn the label distribution of each instance. Finally, to obtain the calibrated confidence of each aggregated label, we normalize the learned distribution from the regression network and take its maximum value. Extensive experimental results indicate that LDLCC can serve as a universal post-processing method to calibrate the confidence of each aggregated label, and thus further enhance the performance of downstream tasks.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 15688
Loading