HMM-based speech synthesis adaptation using noisy data: Analysis and evaluation methods

Reima Karhila, Ulpu Remes, Mikko Kurimo

Published: 2013, Last Modified: 02 May 2023ICASSP 2013Readers: Everyone

Abstract: This paper investigates the role of noise in speaker-adaptation of HMM-based text-to-speech (TTS) synthesis and presents a new evaluation procedure. Both a new listening test based on ITU-T recommendation 835 and a perceptually motivated objective measure, frequency-weighted segmental SNR, improve the evaluation of synthetic speech when noise is present. The evaluation of voices adapted with noisy data show that the noise plays a relatively small but noticeable role in the quality of synthetic speech: Naturalness and speaker similarity are not affected in a significant way by the noise, but listeners prefer the voices trained from cleaner data. Noise removal, even when it degrades natural speech quality, improves the synthetic voice.

0 Replies