Formant estimation from speech signal using the magnitude spectrum modified with group delay spectrum

Md. Shahidur Rahman

Published: 28 Feb 2021, Last Modified: 01 Oct 2024OpenReview Archive Direct UploadEveryoneCC BY-ND 4.0

Abstract: The magnitude spectrum is a popular mathematical tool for speech signal analysis. In this paper, we propose a new technique for improving the performance of the magnitude spectrum by utilizing the benefits of the group delay (GD) spectrum to estimate the characteristics of a vocal tract accurately. The traditional magnitude spectrum suffers from difficulties when estimating vocal tract characteristics, particularly for high-pitched speech owing to its low resolution and high spectral leakage. After phase domain analysis, it is observed that the GD spectrum has low spectral leakage and high resolution for its additive property. Thus, the magnitude spectrum modified with its GD spectrum, referred to as the modified spectrum, is found to significantly improve the estimation of formant frequency over traditional methods. The accuracy is tested on synthetic vowels for a wide range of fundamental frequencies up to the high-pitched female speaker range. The validity of the proposed method is also verified by inspecting the formant contour of an utterance from the Texas Instruments and Massachusetts Institute of Technology (TIMIT) database and standard F2–F1 plot of natural vowel speech spoken by male and female speakers. The result is compared with two state-of-the-art methods. Our proposed method performs better than both of these two methods.