On the limitations of voice conversion techniques in emotion identification tasks

Published: 01 Jan 2007, Last Modified: 01 Oct 2024INTERSPEECH 2007EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The growing interest in emotional speech synthesis urges effective emotion conversion techniques to be explored. This paper estimates the relevance of three speech components (spectral envelope, residual excitation and prosody) for synthesizing identifiable emotional speech, in order to be able to customize voice conversion techniques to the specific characteristics of each emotion. The analysis has been based on a listening test with a set of synthetic mixed-emotion utterances that draw their speech components from emotional and neutral recordings. Results prove the importance of transforming residual excitation for the identification of emotions that are not fully conveyed through prosodic means (such as cold anger or sadness in our Spanish corpus).
Loading