Romanian Syllabication Using Machine LearningOpen Website

2013 (modified: 06 Nov 2022)TSD 2013Readers: Everyone
Abstract: The task of finding syllable boundaries can be straightforward or challenging, depending on the language. Text-to-speech applications have been shown to perform considerably better when syllabication, whether orthographic or phonetic, is employed as a means of breaking down the text into units bellow word level. Romanian syllabication is non-trivial mainly but not exclusively due to its hiatus-diphthong ambiguity. This phenomenon affects both phonetic and orthographic syllabication. In this paper, we focus on orthographic syllabication for Romanian and show that the task can be carried out with a high degree of accuracy by using sequence tagging. We compare this approach to support vector machines and rule-based methods. The features we used are simply character n-grams with end-of-word marking.
0 Replies

Loading