Abstract: Crowdsourcing has become a popular paradigm for collecting large-scale labeled datasets by leveraging numerous annotators. However, these annotators often provide noisy labels due to varying expertise. Truth inference aims to infer accurate consensus labels from noisy crowdsourced annotations. Existing approaches rely heavily on hand-engineered assumptions or ground truth data, limiting their applicability. To address this, we propose GOVERN, a graph contrastive learning framework for truth inference without such external supervision. GOVERN employs a novel graph data augmentation strategy to generate views capturing worker coordination patterns. A contrastive objective then encourages invariant representations across views, enabling the discovery of features related to the hidden consensus. Further, a label correction method based on k-nearest neighbors refines noisy pseudo-labels to supervise model training. Comprehensive experiments on 9 real-world datasets demonstrate that GOVERN outperforms state-of-the-art truth inference techniques.
Loading