Analysis-By-Synthesis Modeling of Bengali Intonation

Published: 2022, Last Modified: 08 Jan 2026SPECOM 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The main concern behind deriving natural sounding synthesized speech lies in the objective mapping of the relation between formal and functional representations of prosody in human speech. Besides stress, rhythm, and duration, intonation is the most vital part of prosody that contributes to the naturalness of any synthetic speech. Latest prosodic studies of Bengali and their application have been carried out using Autosegmental-Metrical and Fujisaki models, but there remains much scope for improving naturalness of synthetic speech in existing TTS systems. In this paper, we study Bengali intonation patterns with a language-independent, hybrid phonetic-phonological model of Momel-INTSINT. Analysis-by-synthesis paradigm involves automatic symbolic coding of the prosodic form by INTSINT (INternational Transcription System for INTonation) that has been derived from the Momel (Modelling Melody) algorithm by stylizing the raw F0 curve to reduce the complex acoustic data to a simplified model. This symbolic representation then becomes the input to the ProZed tool for generating synthetic speech. Our study is based on the prosodically representative sentence set of Bengali speech developed by CDAC-Kolkata. The automatic labeling framework of INTSINT tones helps in precise modeling of intonation patterns within hierarchical prosodic units of accentual, intermediate, and intonation phrases in Bengali utterances.
Loading