How Effective Are AI Models in Translating English Scientific Texts to Nigerian Pidgin: A Low-resource Language?

Flora Oladipupo; Anthony Soronnadi; Ife Adebara; Olubayo Adekanmbi

How Effective Are AI Models in Translating English Scientific Texts to Nigerian Pidgin: A Low-resource Language?

Flora Oladipupo, Anthony Soronnadi, Ife Adebara, Olubayo Adekanmbi

Published: 05 Mar 2025, Last Modified: 19 Mar 2025ICLR 2025 Workshop ICBINBEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 4 pages)

Keywords: Machine translation, Nigerian Pidgin, Scientific Texts

Abstract: This research explores the challenges and limitations of applying deep learning models to the translation of scientific texts from English to Nigerian Pidgin, a widely spoken but low-resource language in West Africa. Despite advancements in machine translation, translating domain-specific content such as biological research papers presents unique obstacles, including data scarcity, linguistic complexity, and model generalization issues. We investigate the performance of AI models, including Pidgin-UNMT, mt5-base model, AfriTeVa base, Afri-mt5 base model and GPT 4.0 model through a comparative analysis using BLEU scores, CHRF, TER, Africomet metrics on a newly created Eng-PidginBioData dataset of biological texts. Our findings reveal significant gaps in model performance, emphasizing the need for more domain-specific fine-tuning, improved dataset creation, and collaboration with native speakers to enhance translation accuracy. By presenting real-world challenges encountered in applying deep learning to low-resource languages this research suggests strategies to overcome these barriers. Our study provides valuable insights into the persistent challenges faced by AI-driven translation systems, from limited data to domain mismatches, and highlights ways to enhance their effectiveness for underrepresented languages. By addressing these constraints, we offer actionable strategies for more inclusive and impactful scientific knowledge dissemination.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 29

Loading