Unlocking the Potential: an evaluation of Text-to-Speech Models for the Bahnar Language

Published: 01 Jan 2023, Last Modified: 16 May 2025BCD 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The paper aims at evaluating the effectiveness of an AI based mobile application of text- to-speech models for Bahnar language. In this application, a sequential combination of two models was implemented, starting with the application of the Grad-TTS model and subsequently followed by the Hifi-GAN model. Grad-TTS was employed to ensure a highly correct pronunciation of Bahnar words without being constrained by the dataset. The strengths of Hifi-GAN, in other hands, have been fine-tuned for the Bahnaric language to enhance the quality of synthesized audio, inorder to produce a native-like Bahnar voice and accent. Those artificially generated sounds from our model achieved a high level of naturalness.
Loading