Abstract: Due to concerns about human error in crowdsourcing, it is standard practice to collect labels for the same data point from multiple internet workers. We show that the resulting budget can be used more effectively with a flexible worker assignment strategy that asks fewer workers to analyze data that are easy to label and more workers to analyze data that requires extra scrutiny. Our main contribution is to show how the worker label aggregation can be formulated using a probabilistic approach, and how the allocations of the number of workers to a task can be computed optimally based on task difficulty alone, without using worker profiles. Our representative target task is identifying entailment between sentences. To illustrate the proposed methodology, we conducted simulation experiments that utilize a machine learning system as a proxy for workers and demonstrate its advantages over a state-of-the-art commercial optimizer.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=Xff3XrqcgI¬eId=Xff3XrqcgI
Changes Since Last Submission: The previous submission "doesn't follow TMLR's stylefile format (margins aren't of the correct size)," as pointed out in the rejection comment. The margin issue has been fixed in the new submission. Furthermore, one author is removed in the new submission after a careful review of the contribution.
Assigned Action Editor: ~Ian_A._Kash1
Submission Number: 5785
Loading