Abstract: There have been many efforts to improve the quality of speech
synthesis systems in conversational AI. Although state-of-the-
art systems are capable of producing natural-sounding speech,
the generated speech often lacks prosodic variation and is not
always suited to the task. In this paper, we examine dialogue
data collection methods to use as training data for our acoustic
models. We collect speech using three different setups: (1) Ran-
dom read-aloud sentences; (2) Performed dialogues; (3) Semi-
Spontaneous dialogues. We analyze prosodic and textual prop-
erties of the data collected in these setups and make some rec-
ommendations to collect data for speech synthesis in conversa-
tional AI settings.
Loading