Romanian Syllabification Using Deep Neural NetworksOpen Website

Published: 01 Jan 2021, Last Modified: 01 Jun 2023SLERD 2021Readers: Everyone
Abstract: Syllabification may be considered trivial for humans, but it can prove to be a challenging task in terms of automated text analysis. In this study, we explore three approaches to syllabify words in Romanian using state-of-the-art deep learning architectures in sequence prediction, namely BiLSTM, CNN, and transformer. In contrast to previous approaches, our models take into account the part of speech of the word, which in return can weigh heavily in situations where words have the same written form, but different syllabification. Our best model obtains an accuracy of approximately 98% using a conditional random field on top of the BiLSTM architecture, surpassing all previous state-of-the-art models. Our model represents a building block for multiple smart learning ecosystems, ranging from better hyphenation software for text evaluation, to text-to-speech and speech-to-text frameworks employed in intelligent houses or personal assistants.
0 Replies

Loading