Spectral Clustering and Labeling for Crowdsourcing with Inherently Distinct Task Types

Saptarshi Mandal; Seo Taek Kong; Dimitrios Katselis; R. Srikant

Spectral Clustering and Labeling for Crowdsourcing with Inherently Distinct Task Types

Saptarshi Mandal, Seo Taek Kong, Dimitrios Katselis, R. Srikant

Published: 03 Oct 2025, Last Modified: 03 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The Dawid-Skene model is the most widely assumed model in the analysis of crowdsourcing algorithms that estimate ground-truth labels from noisy worker responses. In this work, we are motivated by crowdsourcing applications where workers have distinct skill sets and their accuracy additionally depends on a task's type. Focusing on the case where there are two types of tasks, we propose a spectral method to partition tasks into two groups such that a worker has the same reliability for all tasks within a group. Our analysis reveals a separability condition such that task types can be perfectly recovered if the number of workers $n$ scales logarithmically with the number of tasks $d$. Numerical experiments show how clustering tasks by type before estimating ground-truth labels enhances the performance of crowdsourcing algorithms in practical applications.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: This is a camera-ready version of the manuscript under submission (Paper 4271). It is currently accepted with minor revision. **Changes since last revision:** A discussion section has been added on how the approach could be extended to realistic settings and about the practical implications of our work. This is in response to the minor revision suggested by the action editor.

Code: https://github.com/Saptarsh/Saptarsh.github.io/blob/master/MultiTypeCrowdsourcing_Saptarshi.ipynb

Assigned Action Editor: ~Jinwoo_Shin1

Submission Number: 4271

Loading