High-resolution sinusoidal modeling of unvoiced speech

George P. Kafentzis, Yannis Stylianou

Published: 2016, Last Modified: 15 Mar 2024ICASSP 2016Readers: Everyone

Abstract: In this paper, a recently proposed high-resolution Sinusoidal Model, dubbed the extended adaptive Quasi-Harmonic Model (eaQHM), is applied on modeling unvoiced speech sounds. Unvoiced speech sounds are parts of speech that are highly non-stationary in the time-frequency plane. Standard sinusoidal models fail to model them accurately and efficiently, thus introducing artefacts, while the reconstructed signals do not attain the quality and naturalness of the originals. Motivated by recently proposed non-stationary transforms, such as the Fan-Chirp Transform (FChT), eaQHM is tested to confront these effects and it is shown that highly accurate, artefact-free representations of unvoiced sounds are possible using the non-stationary properties of the model. Experiments on databases of unvoiced sounds show that, on average, eaQHM improves the Signal to Reconstruction Error Ratio (SRER) obtained by the standard Sinusoidal Model (SM) by 93%. Moreover, modeling superiority is also supported via informal listening tests with two other models, namely the SM and the well-known STRAIGHT method.

0 Replies