Keywords: Crowdsourcing; Label Aggregation; Annotator Modeling; Graph-Based Reliability Estimation; Weak Supervision; Learning from Noisy Labels; Correlated Annotator Errors
Abstract: Crowdsourced label aggregation methods model each annotator independently, overlooking the collective structure of disagreements across the annotator pool---a structure that encodes which workers systematically corrupt the majority signal even when their mutual agreement is high. We propose DGN (Disagreement Graph Networks), which represents annotators as nodes in a weighted disagreement graph and uses eigenvector centrality as a reliability signal provably monotone with annotator error rate (Theorem 1). Our main contribution, DGN-S, requires no EM and no dataset-specific tuning. On benchmarks explicitly designed to stress-test correlated low-quality annotators---the regime our theory targets---DGN-S matches or exceeds EM-based state-of-the-art methods, while running $27\times$ faster than MACE at $M=100$ annotators (0.064s vs. 1.73s); in homogeneous annotator pools where graph structure carries weaker signal, accuracy degrades gracefully by at most 0.5 percentage points, consistent with our theoretical predictions. On the real Music genre benchmark, DGN-S equals MACE (80.1% vs. 80.0%) at a fraction of the runtime. For settings requiring interpretability, DGN-EM embeds the same graph as a Bayesian prior, yielding a label-free annotator reliability diagnostic (Spearman $\rho\approx 0.94$) with $O(M)$ parameters that remain stable in ultra-sparse regimes where $O(MK^2)$ confusion-matrix methods degrade.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 467
Loading