Amharic Text-to-Speech System

29 Jul 2023 (modified: 07 Dec 2023)DeepLearningIndaba 2023 Conference SubmissionEveryoneRevisionsBibTeX
Keywords: Amharic, Text-to-Speech, NLP, DeepLearning, Disabilities, Vision Impairment
Abstract: Artificial Intelligence (AI) has the power to address critical challenges in education and foster inclusive learning environments, advancing Sustainable Development Goal (SDG) 4, which aims to ensure quality education for all. However, in many developing countries, providing quality education to individuals with disabilities remains a significant obstacle. Ethiopia faces numerous challenges in achieving inclusive education, especially for female students with sensory impairments. With a high prevalence of blindness and vision impairment, Ethiopia urgently seeks to improve educational opportunities for disabled individuals. This study focuses on developing an end-to-end Text-to-Speech (TTS) system for the Amharic language, a low-resource Ethiopian Semitic language. TTS, as an assistive AI technology, can greatly enhance educational accessibility for students with disabilities, including those with visual or hearing impairments. Yet, Amharic lacks access to such technologies, limiting opportunities for inclusive education in the region. The development of the Amharic TTS system poses unique challenges, mainly due to the scarcity of quality training data, low training and inference efficiency, and slow convergence with a large vocabulary. To surmount these hurdles, state-of-the-art techniques, particularly Tacotron2, were employed by the researchers. A representative Amharic text corpus was compiled from diverse sources, including the Bible, student textbooks, and news, to ensure a comprehensive representation of all Amharic sounds. The resulting corpus contained sentences ranging from 2 to 20 seconds, with a total of 25 hours of speech data gathered from both male and female speakers. Extensive model training was conducted, evaluating the Amharic TTS system's performance using the Mean Opinion Score (MOS) technique. The assessment revealed highly intelligible and natural-sounding speech with an encouraging overall performance score of 95.5 %. This demonstrates the system's success in rendering most words recognizable, thereby addressing language barriers in education. The study's objective is to promote equitable and inclusive education for visually impaired and blind students in Ethiopia. Additionally, the developed TTS system can be utilized as a plugin for e-learning tutorials, providing real-time Amharic subtitles and further breaking language barriers in education. Beyond its local impact, this study holds broader implications for Sub-Saharan Africa (SSA), presenting a significant contribution to AI-driven educational solutions across the region. By fostering inclusive learning practices and lifelong learning opportunities, AI-powered TTS technology has the potential to revolutionize education in SSA, aligning with progress towards SDG4.
Submission Category: Machine learning algorithms
Submission Number: 44
Loading