Analysis of emotional speech using an adaptive sinusoidal model

George P. Kafentzis, Theodora Yakoumaki, Athanasios Mouchtaris, Yannis Stylianou

Published: 2014, Last Modified: 15 Mar 2024EUSIPCO 2014Readers: Everyone

Abstract: Processing of emotional (or expressive) speech has gained attention over recent years in the speech community due to its numerous applications. In this paper, an adaptive sinusoidal model (aSM), dubbed extended adaptive Quasi-Harmonic Model - eaQHM, is employed to analyze emotional speech in accurate, robust, continuous, timevarying parameters (amplitude, frequency, and phase). It is shown that these parameters can adequately and accurately represent emotional speech content. Using a well known database of narrowband expressive speech (SUSAS) we show that very high Signal-to-Reconstruction-Error Ratio (SRER) values can be obtained, compared to the standard sinusoidal model (SM). Formal listening tests on a smaller wideband speech database show that the eaQHM outperforms SM from a perceptual resynthesis quality point of view. Finally, preliminary emotion classification tests show that the parameters obtained from the adaptive model lead to a higher classification score, compared to the standard SM parameters.

0 Replies