crowd-hpo: Realistic Hyperparameter Optimization and Benchmarking for Learning from Crowds with Noisy Labels

Published: 21 Nov 2025, Last Modified: 21 Nov 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Authors that are also TMLR Expert Reviewers: ~Marek_Herde1
Abstract: Crowdworking is a cost-efficient solution for acquiring class labels. Since these labels are subject to noise, various approaches to learning from crowds have been proposed. Typically, these approaches are evaluated using default hyperparameter configurations, which often result in unfair and suboptimal performance, or using hyperparameter configurations tuned via a validation set with ground truth class labels, which represents an often unrealistic scenario. Moreover, both setups can yield different approach rankings, complicating study comparisons. Therefore, we introduce crowd-hpo as a framework for evaluating approaches to learning from crowds, together with criteria for selecting well-performing hyperparameter configurations using only noisy crowd-labeled validation data. Extensive experiments with neural networks demonstrate that these criteria select hyperparameter configurations that improve the learning from crowds approaches' generalization performances, measured on separate test sets with ground truth labels. Hence, incorporating such criteria into experimental studies is essential for enabling fairer and more realistic benchmarking.
Certifications: Expert Certification
Submission Length: Long submission (more than 12 pages of main content)
Code: https://github.com/ies-research/multi-annotator-machine-learning/tree/crowd-hpo
Assigned Action Editor: ~Takashi_Ishida1
Submission Number: 5410
Loading