Amharic Light StemmerDownload PDF

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Withdrawn SubmissionReaders: Everyone
Abstract: Stemming is the process of removing affixes( i.e. prefixes, infixes and suffixes) that improve the accuracy and performance of information retrieval systems.This paper presents the reduction of Amharic words to corresponding stem where with the intention that it preserves semantic information. The proposed approach efficiently removes affixes from an Amharic word. The process of removing such affixes (prefixes, infixes and suffixes) from a word to its base form is called stemming. While many stemmers exist for dominant languages such as English, under resourced languages such as Amharic which lacks such powerful tool support. In this paper, we design a light Amharic stemmer relying on the rules that receives an Amharic word and then it finds a match to the beginning of a word to the possible prefixes and to its ending with the possible suffixes and finally it checks whether it has infix. The final result is the stem if there is any prefix, infix or/and suffix, otherwise it remains in one of the earlier states. The technique does not rely on any additional resource (e.g. dictionary) to verify the generated stem. The performance of the generated stemmer is evaluated using manually annotated Amharic words. The result is compared with current state-of-the-art stemmer for Amharic showing an increase of 7% in stemmer correctness.
Keywords: Amharic light Stemmer, Affixes, Amharic Sentiment Classification
TL;DR: Amharic Light Stemmer is designed for improving performance of Amharic Sentiment Classification.
Original Pdf: pdf
4 Replies

Loading