Document Classification for the Under-resourced Amharic Language

Michael Melese Woldeyohannis

Document Classification for the Under-resourced Amharic Language

Michael Melese Woldeyohannis

Published: 02 Aug 2024, Last Modified: 12 Nov 2024WiNLP 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Amharic Document, News Classification, Under-resourced language

TL;DR: Under-resourced Amharic language document classification

Abstract: Natural language processing (NLP) is severely hampered by a scarcity of digital resources. This is especially true for Amharic, a language with few resources but a rich morphology. In response, a total of 67,739 Amharic news documents from 8 different categories are gathered from web sources. A baseline document categorization experiment is carried out to validate the usability of the obtained corpora from various domains. In the lack of linguistic information, the experimental results reveal that deep learning achieves 84.53% accuracy.

Submission Number: 26

Loading