Abstract: An dynamical system model is proposed for better representing the spectral dynamics of speech for recognition. It is assumed that the observed feature vectors of a phone segment are the output of a stochastic linear dynamical system, and two alternative assumptions regarding the relationship of the segment length and the evolution of the dynamics are considered. Training is equivalent to the identification of a stochastic linear system, and a nontraditional approach based on the estimate-maximize algorithm is followed. This model is evaluated on a phoneme classification task using the TIMIT database. It is shown that the classification performance obtained using the proposed model is significantly better than that obtained using either an independent-frame or a Gauss-Markov assumption on the observed frames. >
0 Replies
Loading