Spectral Clustering and Labeling for Crowdsourcing with Inherently Distinct Task Types

TMLR Paper4271 Authors

20 Feb 2025 (modified: 28 Mar 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The Dawid-Skene model is the most widely assumed model in the analysis of crowdsourcing algorithms that estimate ground-truth labels from noisy worker responses. In this work, we are motivated by crowdsourcing applications where workers have distinct skill sets and their accuracy additionally depends on a task's type. Focusing on the case where there are two types of tasks, we propose a spectral method to partition tasks into two groups such that a worker has the same reliability for all tasks within a group. Our analysis reveals a separability condition such that task types can be perfectly recovered if the number of workers $n$ scales logarithmically with the number of tasks $d$. Numerical experiments show how clustering tasks by type before estimating ground-truth labels enhances the performance of crowdsourcing algorithms in practical applications.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Jinwoo_Shin1
Submission Number: 4271
Loading