Keywords: Word prediction, POS, Statistical approach
TL;DR: Amharic word sequence prediction model is developed with statistical methods using Hidden Markov Model by incorporating detailed parts of speech tag , user profiling or adaptation.
Abstract: Word prediction is guessing what word comes after, based on some current information, and it is the main
focus of this study. Even though Amharic is used by a large number of populations, no significant work is
done on the topic. In this study, Amharic word sequence prediction model is developed using Machine
learning. We used statistical methods using Hidden Markov Model by incorporating detailed parts of speech
tag and user profiling or adaptation. One of the needs for this research is to overcome the challenges on inflected languages. Word sequence prediction is a challenging task for inflected languages (Gustavii &Pettersson, 2003; Seyyed & Assi, 2005). These kinds of languages are morphologically rich and have enormous word forms, which is a word can
have different forms. As Amharic language is morphologically rich it shares the problem (Tessema,
2014).This problem makes word prediction system much more difficult and results poor performance.
Previous researches used dictionary approach with no consideration of context information. Due to this
reason, storing all forms in a dictionary won’t solve the problem as in English and other less inflected
languages. Therefore, we introduced two models; tags and words and linear interpolation that use parts of
speech tag information in addition to word n-grams in order to maximize the likelihood of syntactic
appropriateness of the suggestions. The statistics included in the systems varies from single word
frequencies to parts-of-speech tag n-grams. We described a combined statistical and lexical word prediction
system and developed Amharic language models of bigram and trigram for the training purpose. The overall
study followed Design Science Research Methodology (DSRM).
Original Pdf: pdf
4 Replies
Loading