IGBOSUM1500 - INTRODUCING THE IGBO TEXT SUMMARIZATION DATASET

CHINEDU EMMANUEL MBONU; Chiamaka Ijeoma Chukwuneke; Roseline Uzoamaka Paul; Ignatius Ezeani; Ikechukwu Onyenwe

IGBOSUM1500 - INTRODUCING THE IGBO TEXT SUMMARIZATION DATASET

CHINEDU EMMANUEL MBONU, Chiamaka Ijeoma Chukwuneke, Roseline Uzoamaka Paul, Ignatius Ezeani, Ikechukwu Onyenwe

Published: 08 Apr 2022, Last Modified: 05 May 2023AfricaNLP 2022Readers: Everyone

Keywords: igbo, summarisation, natural language processing

TL;DR: Creating IgboSum1500 - a dataset for Igbo text summarization research

Abstract: Igbo, along with Hausa and Yor`ub´a, is one of the three prominent indigenous Nigerian languages. It is spoken by the Igbos of southeastern Nigeria with over 30 million speakers resident in Nigeria and many more abroad. In NLP terms, Igbo is still considered to be acutely under-resourced and ‘scraping-by’ according to Joshi et al. (2020). Currently, efforts are ongoing in developing IgboNLP e.g. part-of-speech tagging (Onyenwe et al., 2019), diacritic restoration (Ezeani et al.,2016), embedding based analogy and similarity (Ezeani et al., 2018), machine translation (Ezeani et al., 2020), (Nekoto et al., 2020), and named-entity recognition (Adelani et al., 2021). However, these efforts need to be sustained by creating more resources and expanding the scope of coverage of common downstream NLP tasks in Igbo, and one of such tasks is text summarization.

1 Reply

Loading