Abstract: In this paper, we address issues associated with a feature-space maximum likelihood linear regression (fMLLR) adaptation method applied to non-native speech recognition. In particular, fMLLR smoothing is proposed here to compensate for mismatches between adaptation and test data, caused by the various disfluencies of non-native speakers. The proposed fMLLR smoothing is performed with a Viterbi decoding procedure and implemented at two levels: a Gaussian mixture probability density function (mpdf) level and an observation probability density function (opdf) level. The mpdf-level smoothing is performed by comparing the pdf of each Gaussian mixture component of an original speech feature vector with that transformed by the fMLLR. On the other hand, the opdf-level smoothing compares the Gaussian mixture probabilities between the original and its fMLLR transformed feature vectors. It is shown from non-native automatic speech recognition experiments on a Korean-spoken English continuous speech corpus that an ASR system employing the proposed mpdf-level and opdf-level fMLLR smoothing methods can relatively reduce the average word error rate by 30.65% and 29.82%, respectively, when compared to a traditional fMLLR adaptation method.
0 Replies
Loading