Budget-Optimized Crowdworker Allocation

Budget-Optimized Crowdworker Allocation

TMLR Paper5785 Authors

01 Sept 2025 (modified: 18 Sept 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Due to concerns about human error in crowdsourcing, it is standard practice to collect labels for the same data point from multiple internet workers. We show that the resulting budget can be used more effectively with a flexible worker assignment strategy that asks fewer workers to analyze data that are easy to label and more workers to analyze data that requires extra scrutiny. Our main contribution is to show how the worker label aggregation can be formulated using a probabilistic approach, and how the allocations of the number of workers to a task can be computed optimally based on task difficulty alone, without using worker profiles. Our representative target task is identifying entailment between sentences. To illustrate the proposed methodology, we conducted simulation experiments that utilize a machine learning system as a proxy for workers and demonstrate its advantages over a state-of-the-art commercial optimizer.

Submission Type: Long submission (more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=Xff3XrqcgI&noteId=Xff3XrqcgI

Changes Since Last Submission: The previous submission "doesn't follow TMLR's stylefile format (margins aren't of the correct size)," as pointed out in the rejection comment. The margin issue has been fixed in the new submission. Furthermore, one author is removed in the new submission after a careful review of the contribution.

Assigned Action Editor: ~Ian_A._Kash1

Submission Number: 5785

Loading