Neural Sequence Labeling Based Sentence Segmentation for Myanmar Language

Ye Kyaw Thu, Thura Aung, Thepchai Supnithi

Published: 01 Jan 2023, Last Modified: 08 Jan 2026CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: In the informal Myanmar language, for which most NLP applications are used, there is no predefined rule to mark the end of the sentence. Therefore, in this paper, we contributed the first Myanmar sentence segmentation corpus and systematically experimented with twelve neural sequence labeling architectures trained and tested on both sentence and sentence+paragraph data. The word LSTM + Softmax achieved the highest accuracy of 99.95% while trained and tested on sentence-only data and 97.40% while trained and tested on sentence + paragraph data.
Loading