Common Voice and accent choice: data contributors self-describe their spoken accents in diverse ways

Published: 01 Jan 2023, Last Modified: 01 May 2024EAAMO 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The use of machine learning (ML)-powered speech technologies has increased significantly in recent years [40, 56, 72]. The datasets used for training speech models often represent demographic features of the speaker – such as gender, age, and accent. These axes are frequently used to evaluate the training set and model for bias [52]. Here, we focus on how accent is represented in voice data due to the adverse consequences of accent bias. We perform document analysis on several voice datasets to identify how accents are currently represented. We then analyse and visualise speaker-described accents from Mozilla’s Common Voice (CV) v13 English dataset, forming an emergent taxonomy of accent descriptors. We repeat this process using the CV v13 Kiswahili dataset, demonstrating that the taxonomy has use beyond English. We find that accents are currently represented in ways that are geographically, and predominantly, nationally bound. While this pattern is also shown in speaker-described accents from CV, a more diverse set of descriptors is revealed. This work provides some early evidence for re-thinking how accents are represented in datasets intended for ML applications. Our tooling is open-sourced, and we invite further work that uses our taxonomy to assess accent bias in speech data and models.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview