One-Vs-Rest Neural Network English Grapheme Segmentation: A Linguistic Perspective

ACL ARR 2024 June Submission838 Authors

13 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Grapheme-to-Phoneme (G2P) correspondences form foundational frameworks of tasks such as text-to-speech (TTS) synthesis or automatic speech recognition. The G2P process involves taking words in their written form and generating their pronunciation. In this paper, we critique the status quo definition of \textit{grapheme}, currently a forced alignment process relating a single character to either a phoneme or a blank unit, that underlies the majority of modern approaches. We develop a linguistically-motivated redefinition from simple concepts such as vowel and consonant count and word length and offer a proof-of-concept implementation based on a multi-binary neural classification task. Our model achieves state-of-the-art results with a 31.86% Word Error Rate on a standard benchmark, while generating linguistically meaningful grapheme segmentations.
Paper Type: Short
Research Area: Phonology, Morphology and Word Segmentation
Research Area Keywords: phonology, grapheme-to-phoneme conversion, pronunciation modeling, subword representations
Contribution Types: Data resources, Data analysis, Theory
Languages Studied: English
Submission Number: 838