Debiased Label Aggregation for Subjective Crowdsourcing Tasks

Shaun Wallace, Tianyuan Cai, Brendan Le, Luis A. Leiva

2022 (modified: 05 Feb 2023)CHI Extended Abstracts 2022Readers: Everyone

Abstract: Human Intelligence Tasks (HITs) allow people to collect and curate labeled data from multiple annotators. Then labels are often aggregated to create an annotated dataset suitable for supervised machine learning tasks. The most popular label aggregation method is majority voting, where each item in the dataset is assigned the most common label from the annotators. This approach is optimal when annotators are unbiased domain experts. In this paper, we propose Debiased Label Aggregation (DLA) an alternative method for label aggregation in subjective HITs, where cross-annotator agreement varies. DLA leverages user voting behavior patterns to weight labels. Our experiments show that DLA outperforms majority voting in several performance metrics; e.g. a percentage increase of 20 points in the F1 measure before data augmentation, and a percentage increase of 35 points in the same measure after data augmentation. Since DLA is deceptively simple, we hope it will help researchers to tackle subjective labeling tasks.

0 Replies