Keywords: Pluralistic annotation, personalization, social choice
TL;DR: We propose a metric to quantify the error incurred by majority vote aggregation and present a mitigation via personalization.
Abstract: Machine learning practitioners frequently use majority vote to resolve disagreement in multi-annotator datasets. While this approach is natural in settings where a single ground truth label exists for each instance, it hides the presence of disagreement for subjective annotation tasks. In domains such as language modeling, information retrieval, and top-k recommendation, models must avoid suppressing minority views and express when the answer to a query is contentious. We propose personalized error metrics to formalize the requirement of strong performance across a heterogeneous user population. Following this framework, we develop an algorithm for training an ensemble of models, each specialized for a different segment of the population.
Submission Number: 22
Loading