Automatic Text Summarization for Moroccan Arabic Dialect Using an Artificial Intelligence ApproachOpen Website

Published: 01 Jan 2022, Last Modified: 14 Sept 2023CBI 2022Readers: Everyone
Abstract: A major advantage of artificial intelligence is its ability to automatically perform tasks at a human-like level quickly; this is needed in many fields, and more particularly in Automatic Text Summarization (ATS). Several advances related to this technique were made in recent years for both extractive and abstractive approaches, notably with the advent of sequence-to-sequence (seq2seq) and Transformers-based models. In spite of this, the Arabic language is largely less represented in this field, due to its complexity and a lack of datasets for ATS. Although some ATS works exist for Modern Standard Arabic (MSA), there is a lack of ATS works for the Arabic dialects that are more prevalent on social networking platforms and the Internet in general. Intending to take an initial step toward meeting this need, we present the first work of ATS concerning the Moroccan dialect known as Darija. This paper introduces the first dataset intended for the summarization of articles written in Darija. In addition, we present state-of-the-art results based on the ROUGE metric for extractive methods based on BERT embeddings and K-MEANS clustering, as well as abstractive methods based on Transformers models.
0 Replies

Loading