Feature Extraction Using Power-Law Adjusted Linear Prediction With Application to Speaker Recognition Under Severe Vocal Effort Mismatch

Rahim Saeidi, Paavo Alku, Tom Bäckström

Published: 2016, Last Modified: 01 Oct 2024IEEE ACM Trans. Audio Speech Lang. Process. 2016EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Linear prediction is one of the most established techniques in signal estimation, and it is widely utilized in speech signal processing. It has been long understood that the nerve firing rate of human auditory system can be approximated by power law non-linearity, and this has been the motivation behind using perceptual linear prediction in extracting acoustic features in a variety of speech processing applications. In this paper, we revisit the application of power law non-linearity in speech spectrum estimation by compressing/expanding power spectrum in autocorrelation-based linear prediction. The development of so-called LP- α is motivated by a desire to obtain spectral features that present less mismatch than conventionally used spectrum estimation methods when speech of normal loudness is compared to speech under vocal effort. The effectiveness of the proposed approach is demonstrated in a speaker recognition task conducted under severe vocal effort mismatch comparing shouted versus normal speech mode.