Constructing Arabic Reading Comprehension Datasets: Arabic WikiReading and KaifLematha

Eman Albilali, Nora Al-Twairesh, Manar Hosny

Published: 2022, Last Modified: 20 Feb 2025Lang. Resour. Evaluation 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Neural machine reading comprehension models have gained immense popularity over the last decade given the availability of large-scale English datasets. A key limiting factor for neural model development and investigations of the Arabic language is the limitation of the currently available datasets. Current available datasets are either too small to train deep neural models or created by the automatic translation of the available English datasets, where the exact answer may not be found in the corresponding text. In this paper, we propose two high quality and large-scale Arabic reading comprehension datasets: Arabic WikiReading and KaifLematha with around +100 K instances. We followed two different methodologies to construct our datasets. First, we employed crowdworkers to collect non-factoid questions from paragraphs on Wikipedia. Then, we constructed Arabic WikiReading following a distant supervision strategy, utilizing the Wikidata knowledge base as a ground truth. We carried out both quantitative and qualitative analyses to investigate the level of reasoning required to answer the questions in the proposed datasets. We evaluated competitive pre-trained language model that attained F1 scores of 81.77 and 68.61 for the Arabic WikiReading and KaifLematha datasets, respectively, but struggled to extract a precise answer for the KaifLematha dataset. Human performance reported an F1 score of 82.54 for the KaifLematha development set, which leaves ample room for improvement.