WORD SEQUENCE PREDICTION FOR AMHARIC LANGUAGE

Nuniyat Kifle; Ermias Abebe

WORD SEQUENCE PREDICTION FOR AMHARIC LANGUAGE

Nuniyat Kifle, Ermias Abebe

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Word prediction, POS, Statistical approach

TL;DR: Amharic word sequence prediction model is developed with statistical methods using Hidden Markov Model by incorporating detailed parts of speech tag , user profiling or adaptation.

Abstract: Word prediction is guessing what word comes after, based on some current information, and it is the main focus of this study. Even though Amharic is used by a large number of populations, no significant work is done on the topic. In this study, Amharic word sequence prediction model is developed using Machine learning. We used statistical methods using Hidden Markov Model by incorporating detailed parts of speech tag and user profiling or adaptation. One of the needs for this research is to overcome the challenges on inflected languages. Word sequence prediction is a challenging task for inflected languages (Gustavii &Pettersson, 2003; Seyyed & Assi, 2005). These kinds of languages are morphologically rich and have enormous word forms, which is a word can have different forms. As Amharic language is morphologically rich it shares the problem (Tessema, 2014).This problem makes word prediction system much more difficult and results poor performance. Previous researches used dictionary approach with no consideration of context information. Due to this reason, storing all forms in a dictionary won’t solve the problem as in English and other less inflected languages. Therefore, we introduced two models; tags and words and linear interpolation that use parts of speech tag information in addition to word n-grams in order to maximize the likelihood of syntactic appropriateness of the suggestions. The statistics included in the systems varies from single word frequencies to parts-of-speech tag n-grams. We described a combined statistical and lexical word prediction system and developed Amharic language models of bigram and trigram for the training purpose. The overall study followed Design Science Research Methodology (DSRM).

Original Pdf: pdf

4 Replies

Loading