Pidgin Science Voices: A Community-Driven Speech Corpus for Inclusive STEM Education

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: ASR, Speech, STEM, Machine Learning, AI, TTS
Abstract: Scientific knowledge in Nigeria is often restricted to academic English, leaving out millions of speakers of Nigerian Pidgin (~75 million people). Over 38 million Nigerian adults remain functionally illiterate, creating a significant accessibility gap in STEM education. Building on our previous work where we collected and translated English Scientific text to Nigerian pidgin needed to build a Machine Translation system that can accurately translate this low-resource language, we extend the work from written translation to speech. The goal is to build the first large-scale, science-focused Nigerian Pidgin speech corpus, enabling automatic speech recognition (ASR), text-to-speech (TTS), and voice-enabled learning tools that democratize scientific knowledge for underrepresented communities.
Submission Number: 265
Loading