Abstract: The paper aims at evaluating the effectiveness of an AI based mobile application of text- to-speech models for Bahnar language. In this application, a sequential combination of two models was implemented, starting with the application of the Grad-TTS model and subsequently followed by the Hifi-GAN model. Grad-TTS was employed to ensure a highly correct pronunciation of Bahnar words without being constrained by the dataset. The strengths of Hifi-GAN, in other hands, have been fine-tuned for the Bahnaric language to enhance the quality of synthesized audio, inorder to produce a native-like Bahnar voice and accent. Those artificially generated sounds from our model achieved a high level of naturalness.
Loading