Spoken Keyword Retrieval Using Source and System Features

Published: 01 Jan 2017, Last Modified: 13 Apr 2025PReMI 2017EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper, a novel excitation source-related feature set, viz., Teager Energy-based Mel Frequency Cepstral Coefficients (T-MFCC) is proposed for the task of spoken keyword detection. Experiments are carried out on TIMIT database for spoken keyword detection. Furthermore, state-of-the-art feature set, viz., MFCC is used as the baseline spectral feature set to represent implicitly vocal tract (i.e., system) information. The idea is to exploit the vocal-source (and its nonlinear coupling with formant) and system-related information embedded in the spoken query. Experimental results show % EER of 17.23 and 22.58 for MFCC and proposed T-MFCC features, respectively. However, the significant reduction in % EER, i.e., by 1.8 % (as compared to MFCC) is observed when evidences from T-MFCC and MFCC are combined using score-level fusion; indicating that proposed feature set captures complementary linguistic information (in the spoken keyword) than MFCC alone.
Loading