NaijaRC: A Multi-choice Reading Comprehension Dataset for Nigerian Languages

Published: 03 Mar 2024, Last Modified: 11 Apr 2024AfricaNLP 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reading Comprehension, Natural Language Understanding
TL;DR: In this paper, we presented NaijaRC— a new reading comprehension dataset for Nigerian languages. We showed that cross-lingual transfer with three PLMS: AfroXLMR-base, Serengeti, and OFA from an English dataset.
Abstract: In this paper, we create NaijaRC— a new multi-choice Nigerian Reading Comprehension dataset that is based on high-school RC examination for three Nigerian national languages: Hausa (hau), Igbo (ibo), and \yoruba (yor). We provide baseline results by performing cross-lingual transfer using the Belebele training data which is majorly from RACE {RACE is based on English exams for middle and high school Chinese students, very similar to our dataset.} dataset based on several pre-trained encoder-only models. Additionally, we provide results by prompting large language models (LLMs) like GPT-4.
Submission Number: 31
Loading