Keywords: Phonemizer, Phonemes, Grapheme-to-Phoneme, Qur'an, Automatic Speech Recognition
Abstract: Qur'anic recitation follows explicit Tajweed rules that standard Arabic grapheme-to-phoneme tools do not capture, limiting phoneme-level research for the Qur'an. We introduce a modular, computationally efficient Python API for the Hafs ’an Asim recitation style that converts Qur’anic text into a configurable 71-symbol phoneme inventory, comprehensively encoding Tajweed rules such as Idgham, Iqlab, Ikhfaa, Qalqala, Tafkheem, Waqf, etc. We anticipate that this tool will have various use cases in speech recognition, mispronunciation detection, text-to-speech, linguistic analysis and pedagogical applications to name a few. Current limitations include support for Hafs only—extensions to other recitation styles are discussed. The code (https://github.com/Hetchy/Quranic-Phonemizer) and user interface (https://quranicphonemizer.com) are released as open source.
Track: Track 1: ML on Islamic Content / ML for Muslim Communities
Submission Number: 32
Loading