Multi-annotator Deep Learning: A Probabilistic Framework for Classification

Published: 07 Sept 2023, Last Modified: 07 Sept 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Solving complex classification tasks using deep neural networks typically requires large amounts of annotated data. However, corresponding class labels are noisy when provided by error-prone annotators, e.g., crowdworkers. Training standard deep neural networks leads to subpar performances in such multi-annotator supervised learning settings. We address this issue by presenting a probabilistic training framework named multi-annotator deep learning (MaDL). A downstream ground truth and an annotator performance model are jointly trained in an end-to-end learning approach. The ground truth model learns to predict instances' true class labels, while the annotator performance model infers probabilistic estimates of annotators' performances. A modular network architecture enables us to make varying assumptions regarding annotators' performances, e.g., an optional class or instance dependency. Further, we learn annotator embeddings to estimate annotators' densities within a latent space as proxies of their potentially correlated annotations. Together with a weighted loss function, we improve the learning from correlated annotation patterns. In a comprehensive evaluation, we examine three research questions about multi-annotator supervised learning. Our findings show MaDL's state-of-the-art performance and robustness against many correlated, spamming annotators.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: For the camera-ready version, we've made the following changes: - deanonymization of authors, acknowledgements, and the link to the associated GitHub repository, - minor textual adjustments (e.g., spelling, punctuation), - and addition of another small example of an annotator feature in the problem setting.
Supplementary Material: pdf
Assigned Action Editor: ~Jasper_Snoek1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1041