TangaleNLP: Building Po Tangle to English Parallel Corpora and Machine Translation of the Tangle (Tangale) Language

Published: 03 Mar 2024, Last Modified: 11 Apr 2024AfricaNLP 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Tangale language, Machine translation, Digital empowerment, Cultural preservation, Linguistic inclusion, African languages, Transfer learning, Socio-economic development, Data scarcity, Ethical practices, Pre-trained models, Parallel corpus, Indigenous languages
TL;DR: This paper showcases the development of a machine translation system for the Tangale language, aiming to empower its speakers digitally while preserving cultural heritage and promoting linguistic inclusion in the digital age.
Abstract: In a digitally connected world, language barriers are silencing millions, leaving communities like Tangle (Tangale) with limited access to information and online opportunities, and their rich heritage fading. This research offers hope that natural language processing and machine translation can bridge this gap. Our efforts go beyond Po Tangle. We are paving the way for similar systems in other African languages and promoting a more diverse digital space. We have successfully created a Po Tangle-English machine translation system using state-of-the-art AI by fine-tuning the pre-trained M2M100 model using 1150 parallel sentences from the dataset and obtained results showing that the system works and produces translations. The system achieves an evaluation BLEU score of 6.7604 and a prediction BLEU score of 6.0101. This indicates the potential for fluent translations with more substantial data. By building a parallel corpus with native speakers to ensure cultural authenticity, we are discovering much more than just numbers. This empowers communities to take control, enabling socio-economic development and preserving linguistic heritage. Our research is having an impact in the form of more targeted interventions, better education, and more vibrant online communities. It is paving the way for a future where every voice is heard and celebrated, regardless of language. This is a movement towards inclusion and equality, we are breaking down language barriers, celebrating the symphony of human voices, and ensuring that no community is left behind in the digital age.
Submission Number: 54
Loading