D3CODE: Disentangling Disagreements in Data across Cultures on Offensiveness Detection and Evaluation

ACL ARR 2024 April Submission419 Authors

15 Apr 2024 (modified: 07 Jun 2024)ACL ARR 2024 April SubmissionEveryone, Ethics Reviewers, Ethics ChairsRevisionsBibTeXCC BY 4.0
Abstract: While human annotations play a crucial role in language technologies, annotator subjectivity has long been overlooked in data collection. Although recent studies have critically examined this issue, they are often situated in the Western context, documenting differences solely across age, gender, or racial groups. Furthermore, much of this work overlooks the fact that individuals within demographic groups may hold diverse values, which can influence their perceptions beyond group trends. To effectively incorporate these considerations into NLP pipelines, we need datasets with extensive parallel annotations from varied social and cultural groups. In this paper we introduce the \dataset dataset: a large-scale cross-cultural dataset of parallel annotations for offensive language in over 4.5K sentences annotated by a pool of over 4k annotators, balanced across gender and age, from across 21 countries, representing 8 geo-cultural regions. The dataset also contains each annotators' moral values captured along six dimensions of moral foundations: care, equality, proportionality, authority, loyalty, and purity. Our analysis reveals substantial regional variations in annotators' perceptions that are shaped by individual moral values, offering crucial insights for building pluralistic, culturally sensitive NLP models.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: human behavior analysis, language/cultural bias analysis, hate-speech detection, human factors in NLP, values and culture
Contribution Types: Data resources, Data analysis, Surveys
Languages Studied: English
Submission Number: 419
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview