[Published] Non-binary gender representation in Wikidata

Published: 29 Aug 2023, Last Modified: 29 Aug 2023Wikidata Workshop 2023EveryoneRevisionsBibTeX
Abstract: In the era of big data, new ethical questions have arisen from the creation of large knowledge bases, whose data is produced, consumed, and shared by millions of users, both humans and machines. These knowledge bases often contain biographical information about people, including sensitive data such as gender, sex, ethnicity, or sexual orientation. Implicit biases in such data can generate unfairness (Veale and Binns 2017; Mehrabi et al. 2021) and lead to discriminatory applications that impact marginalized communities (Buolamwini and Gebru 2018; Bender et al. 2021). This is particularly true for the trans and non-binary communities, who experience discrimination on the basis of gender identity. Digital projects have struggled to cope with the wider societal acceptance of the fact that gender is not binary (Kessler and McKenna 1985), and in many cases, they have perpetuated — or even amplified — the misgendering and erasure of trans and non-binary people that has occurred in society throughout history (Keyes 2018). In this chapter, we present a preliminary quantitative analysis of non-binary gender identities in a large-scale knowledge base, Wikidata (Vrandečić and Krötzsch 2014). Wikidata is a collaborative project that allows the editing of knowledge — and even the data model itself — by a broad community of users (Piscopo, Phethean, and Simperl 2017). The present research constitutes the first step of our project, Wikidata Gender Diversity (WiGeDi), which aims to investigate the issue of how gender identities are represented in the knowledge base. This study aims to contribute to the growing area of data ethics by offering, for the first time, an empirical exploration of the representation of non-binary gender identities in a large knowledge base, and by providing fresh insights and data to gender studies scholars interested in more qualitative approaches to research. Since every edit and every user discussion throughout the history of Wikidata is archived in the project itself and made publicly accessible, this study allows us to have a unique and comprehensive overview of how non-binary identities have been represented in Wikidata, what exactly has been represented, and why the users have made certain choices. In particular, we performed our analysis from three different — and complementary — perspectives: 1. the modeling question, looking at how the Wikidata ontology has evolved to support non-binary representation, e.g., by updating the properties that directly or indirectly express gender; we aim to analyze the Wikidata ontology to identify representational issues and potential areas of improvement; 2. the data question, computing statistics about non-binary gender representation in the knowledge base, and analyzing it from a quantitative point of view, also by comparing non-binary people described in Wikidata to the general population(s) of non-binary people in society; 3. the community question, looking at how the Wikidata community has handled the evolution towards a more inclusive non-binary representation, by analysing user discussions about the topic in a quantitative way; indeed, gender representation is often intrinsically connected to language. We believe that only by answering all three questions it will be possible to obtain a comprehensive overview of non-binary gender representation in Wikidata.
Submission Number: 17
Loading