Abstract: The temporal correlation of speech presence/absence is widely used in noise estimation. The most popular technique for exploiting temporal correlation is the smoothing of noisy spectra using a time-recursive filter, in which the forgetting factor is controlled by speech presence probability. However, this technique is not unified into a theoretical framework that enables optimal noise estimation. In theory, hidden Markov models (HMMs) are superior to this technique in modeling temporal correlation. HMMs can model a time sequence of presence/absence of speech signal as a dynamic process of the transition between speech and non-speech states. Moreover, a number of methods, such as maximum likelihood, are available for optimal estimation of HMM parameters. This paper presents a constrained sequential HMM for modeling the log-power sequence on each frequency band. The emission probability of each HMM state is represented by a Gaussian model. The Gaussian mean of the non-speech state is considered as the optimal estimate of noise logarithmic power. The HMM parameter set is sequentially estimated from one frame to another on the basis of maximum likelihood. The proposed method is compared with well-established algorithms through various experiments. Our method delivers more accurate results and does not rely on the assumption of the “non-speech signal onset” as do most algorithms.
Loading