Abstract: This paper presents the African Dialect Dataset for Sentiment Analysis, a new natural language processing dataset (AfriDial). This dataset is intended to aid in the classification of multilingual human text using the mother tongue. Around 14k documents in seven distinct dialects, including Tunisian, Moroccan, Chadian, Mauritanian, Burkina Faso, Cameroonian, and Congolese, are included in the AfriDial dataset. The documents, which cover a wide range of subjects like politics, sports, entertainment, and technology, were gathered from open social media and crowdsourcing. Positive, negative, and neutral sentiments are the three classes assigned to each document in the dataset. The AfriDial dataset will be an important tool for researchers working on multilingual text classification and natural language processing (NLP). The paper also presents a baseline model using the transfer learning of bidirectional encoder representations from transformers (BERT) architecture on the AfriDial dataset. An experimental study is presented to introduce more methods and contributions to the field of dialectal NLP
Loading