Predictive text for agglutinative and polysynthetic languagesDownload PDF


16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: This paper presents a set of experiments in the area of morphological modelling and predictioning. We examine the tasks of segmentationand predictive text entry for two under-resourced and indigenous languages, K'iche'and Chukchi. We use different segmentation methods to make datasets for language modelling and then train models of different types: single-way segmented, which are trained using data from one segmentor; two-way segmented, which are trained using concatenated data from two segmentors; and finetuned, which are trained on two datasets from different segmentors. We measure word and character level perplexities of the language models and find that single-way segmented models trained using morphologically segmented data and finetuned models work the best.Finally, we test the language models on the task of predictive text entry using gold standard data and measurethe average number of clicks per character and keystroke savings rate. We find that the models trained using morphologically segmented data work better,although with substantial room for improvement. At last, we propose the usage of morphological segmentation in order to improve the end-user experience while using predictive text and we plan on testing this assumption by training other models and experimenting on more languages.
0 Replies
