Precision-Driven Low-Resource Speech Synthesis For Bangla Text-To-Speech System

Tabassum Sadia Shahjahan; Md. Ismail Hossain; Kazi Rafat; Mohammad Ruhul Amin; Fuad Rahman; Nabeel Mohammed

Precision-Driven Low-Resource Speech Synthesis For Bangla Text-To-Speech System

Tabassum Sadia Shahjahan, Md. Ismail Hossain, Kazi Rafat, Mohammad Ruhul Amin, Fuad Rahman, Nabeel Mohammed

Published: 05 Mar 2024, Last Modified: 12 May 2024PML4LRS PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: speech recognition, low-resource, text-to-speech, end-to-end, deep learning, quantization, transformer, bangla

TL;DR: The paper proposes a robust Text-To-Speech model trained on a low-resourced language, Bangla.

Abstract: Recent developments in deep learning and artificial intelligence have facilitated widespread commercial adoption of text-to-speech models that can produce intelligible and natural-sounding speech. Although numerous synthetic models are widely available for languages such as English, Chinese, etc., extremely low-resourced languages like Bangla continue to pose a formidable challenge for synthesizing speech data. In this paper, we adopt a single-stage and a two-stage training approach, followed by quantization techniques, to generate high-quality speech from Bangla dataset. Our experimental results show that the proposed models achieve both intelligibility and naturalness with reduced inference time even under extremely low settings. We are the first to provide a robust Bangla Text-To-speech system usable for both academic and commercial applications.

Submission Number: 79

Loading