Formant Frequency Estimation of High-Pitched Speech by Homomorphic Prediction
Abstract: The conventional model of the linear prediction analysis suffers from difficulties in
estimating vocal tract characteristics of high-pitched speakers. This is because the autocorrelation
function used by the autocorrelation method of linear prediction for estimating autoregressive
coefficients is actually an ‘‘aliased’’ version of that of the vocal tract impulse response. This ‘‘aliasing’’
occurs due to the periodic nature of voiced speech. Generally it is accepted that homomorphic filtering
can be used to obtain an estimate of vocal tract impulse response which is free from periodicity. Thus
linear prediction of the resulting vocal tract impulse response (referred to as homomorphic prediction)
is expected to be free from variations of fundamental frequencies. To our knowledge any experimental
study, however, has not yet appeared on the suitability of this method for analyzing high-pitched
speech. This paper presents a detail study on the prospects of homomorphic prediction as a formant
tracking tool especially for high-pitched speech where linear prediction fails to obtain accurate
estimation. The formant frequencies estimated using the proposed method are found to be accurate by
more than an order of magnitude compared to the conventional procedure. The accuracy of formant
estimation is verified on synthetic vowels for a wide range of pitch periods covering typical male and
high-pitched female speakers. The validity of the proposed method is also examined by inspecting the
spectral envelopes of natural speech spoken by high-pitched female speakers. We noticed that almost
all the previous methods dealing with this limitation of linear prediction are based on the covariance
technique where the obtained AR filter can be unstable. The solutions obtained by the current method
are guaranteed to be stable which makes it superior for many speech analysis applications.
Loading