GE2PE: Persian End-to-End Grapheme-to-Phoneme Conversion

ACL ARR 2024 June Submission3240 Authors

15 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Text-to-Speech (TTS) systems have made significant strides, enabling the generation of speech from grapheme sequences. However, for low-resource languages, these models still struggle to produce natural and intelligible speech. Grapheme-to-Phoneme conversion (G2P) addresses this challenge by enhancing the input sequence with phonetic information. Despite these advancements, existing G2P systems face limitations when dealing with Persian texts due to the complexity of Persian transcription. In this study, we focus on enriching resources for the Persian language. To achieve this, we introduce two novel G2P training datasets: one manually labeled and the other machine-generated. These datasets comprise over five million sentences alongside their corresponding phoneme sequences. Additionally, we propose two evaluation datasets tailored for Persian sub-tasks, including Kasre-Ezafe detection, homograph disambiguation, and handling out-of-vocabulary (OOV) words. To tackle the unique challenges of the Persian language, we develop a new sentence-level End-to-End (E2E) model leveraging a two-step training approach, as outlined in our paper, to maximize the impact of manually labeled data. The results show that our model surpasses the state-of-the-art performance by 1.86\% in word error rate, 4.03\% in Kasre-Ezafe detection recall, and 3.42\% in homograph disambiguation accuracy.
Paper Type: Long
Research Area: Phonology, Morphology and Word Segmentation
Research Area Keywords: grapheme-to-phoneme conversion,datasets for low resource languages,data augmentation
Contribution Types: Publicly available software and/or pre-trained models, Data resources
Languages Studied: Persian
Submission Number: 3240
Loading