Saho Corpus: Semi-automation of Verb Conjugation in Saho: Verbs Class I

Published: 31 Dec 2021, Last Modified: 05 Jan 2025Ethnorêma 18 (2022)EveryoneCC BY 4.0
Abstract: This article develops a semi-automatic morphological analysis module (SaCoFlexor) to generate all inflection forms of the 585 verbs in class 1 (C-I) in the Saho language registered in the current Saho Corpus as a basic data dictionary and presents the results. SaCoFlexor correctly identified 98.8% of the items present in the corpus and classified them in 4 major subcategories according to the initial phonemes of the word, with the correct generation of their inflectional morphology forms generating 13,455 new words, tagged, and linked to their respective roots. The output data increased the number of words in the Saho Corpus and improved performances of the computational linguistics functions, including word frequency generation, word identification mechanism, concordance, collocations, and spell checking.
Loading