Contextual Evaluation of LLM’s Performance on Primary Education Science Learning Contents in the Yoruba Language

Published: 03 Mar 2024, Last Modified: 11 Apr 2024AfricaNLP 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models (LLMs). Natural Language Processing (NLP). Multilingual Evaluation, Inclusive Education, Question-Answering (QA), Named Entity Recognition (NER), Part-of-Speech (POS) Tagging, Paraphrasing
TL;DR: The study examines LLMs (ChatGPT-3.5, Gemini, PaLM) for Yoruba in primary science education, assessing tasks (POS tagging, NER, QA, paraphrasing). Emphasizes need for inclusive LLMs in diverse language and cultural setting
Abstract: In an era marked by the rapid evolution of artificial intelligence, large language models (LLMs) such as ChatGPT 3.5, Llama, and PaLM 2 have become instrumental in transforming educational paradigms. Trained mainly by using English and a mix of data from other languages, these LLMs have exceptional abilities to understand and generate complex human language constructs, leading to revolutionary applications in education. This raises the possibility of creating enriched and personalized educational experiences. Using LLMs can streamline the instructional design process and focus it on developing the content that students need to progress and content that resonates with the learners’ realities, thereby improving learning outcomes even in the domain of primary science. Also, it has been proven that learning science in the student’s mother tongue significantly boosts learning and assimilation, and this should be encouraged, especially in rural areas However, the technological advancement of LLMs raises a pivotal question about the inclusivity and effectiveness of these models in catering to low-resource languages, such as Yoruba, particularly in the domain of primary education science. The unique linguistic structures, idiomatic expressions, and cultural references inherent in Yoruba present formidable challenges for models predominantly trained in high-resource languages. This research critically evaluates LLMs’ ability to comprehend and generate contextually relevant science education content in Yoruba and aims to bridge the educational resource gap for Yoruba-speaking primary learners, especially those in underrepresented communities. Our study conducts a thorough evaluation of large learning models such as ChatGPT 3.5, Gemini, and PaLM 2 across four tasks using a manually developed primary science dataset in the Yoruba language. This approach allows us to assess the models' abilities to understand and replicate the intricacies of Yoruba primary education science contents without losing the context or meaning of the sentences. We focus on zero-shot learning settings for these LLMs to improve reproducibility. Our extensive experimental results reveal the models’ comparative underperformance in various natural language processing (NLP) tasks in the Yoruba language. This observation underscores the necessity for further research and development of more language-specific and domain-specific technologies, particularly for primary science education in low-resource languages.
Submission Number: 36
Loading