Keywords: part of speech tagger, NLP, Chichewa
TL;DR: Development of a Chichewa parts of speech tagger using HMM and Viterbi algorithm for NLP applications, addressing the absence of POS taggers in Chichewa
Abstract: Part of speech (POS) tagging is the process of assigning a word in a text as corresponding to a part of speech based on its definition and its relationship with adjacent and related word in a phrase, sentence or paragraph [1]. POS tagging is used as a prerequisite in search engines, Auto spelling completion, named entity recognition, machine translation and other Natural language processing (NLP) applications. The absence of POS tagger affects the efficient retrieval of information in search engines, the accuracy of spell checker in Auto spelling completion, the ability to translate the given text in machine translations and the ability to detect the name of an entity in Named entity recognition. Despite the existence of POS taggers for different languages, Chichewa lacks POS taggers. This project will come up with Chichewa parts of speech tagger to be used in different natural language processing applications that involves Chichewa language. Chichewa tagger contains these features; preprocessing, tokenization and tagging. The tagger will be implemented following probabilistic/stochastic approach using Hidden Markov Model (HMM) and Viterbi algorithm. It will be developed using python programming language.
Submission Category: Machine learning algorithms
Submission Number: 9
Loading