Adaptation and Frontend Features to Improve Naturalness in Found-Data SynthesisDownload PDF

25 May 2020OpenReview Archive Direct UploadReaders: Everyone
Abstract: We compare two approaches for training statistical parametric voices that make use of acoustic and prosodic features at the utterance level with the aim of improving naturalness of the resultant voices – subset adaptation, and adding new acoustic and prosodic features at the frontend. We have found that the approach of labeling high, middle, or low values for a given feature at the frontend and then choosing which setting to use at synthesis time can produce voices rated as significantly more natural than a baseline voice that uses only the standard contextual frontend features, for both HMM-based and neural network-based synthesis.
0 Replies

Loading