Building a Combined Morphological Model for Russian Word Forms

Published: 01 Jan 2021, Last Modified: 10 Aug 2024AIST 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, high-precision machine learning models for traditional inflectional morphological analysis, as well as models for morpheme segmentation of words were built for Russian. Two these morphological tasks are evidently related, and some NLP applications may require to perform both of them, so development and evaluation of combined morphological model is of research interest. Such a model is supposedly useful for processing texts in languages with rich morphology (e.g., Russian), in particular, for deriving meaning of new words rarely encountered in texts. The paper presents a neural model implementing both inflectional analysis of Russian word forms (with morphological disambiguation) and their segmentation into constituent morphs with their classification. To train the model, a relevant dataset was built, by morphemic labeling of SynTagRus corpus, and transfer learning techniques were applied. Experimental evaluation of the model has shown its sufficiently high quality: 94.2% of precision for morphological tags disambiguation and 88–91% of word-level classification accuracy for segmentation.
Loading