Strategies for developing a conversational speech data set for Text-to-Speech Synthesis

Adaeze Adigwe, Esther Klabbers

Published: 21 Sept 2022, Last Modified: 15 Apr 2026Interspeech 2022EveryoneCC BY 4.0

Abstract: There have been many efforts to improve the quality of speech synthesis systems in conversational AI. Although state-of-the- art systems are capable of producing natural-sounding speech, the generated speech often lacks prosodic variation and is not always suited to the task. In this paper, we examine dialogue data collection methods to use as training data for our acoustic models. We collect speech using three different setups: (1) Ran- dom read-aloud sentences; (2) Performed dialogues; (3) Semi- Spontaneous dialogues. We analyze prosodic and textual prop- erties of the data collected in these setups and make some rec- ommendations to collect data for speech synthesis in conversa- tional AI settings.