Robust automatic speech recognition using acoustic model adaptation prior to missing feature reconstruction

Ulpu Remes, Kalle J. Palomäki, Mikko Kurimo

Published: 2009, Last Modified: 02 May 2023EUSIPCO 2009Readers: Everyone

Abstract: When speech recognition is used in real-world environments, simultaneous speaker and environmental adaptation and compensation for time-varying noise effects is needed. Noise compensation methods like missing feature reconstruction should be combined with adaptation methods like constrained maximum likelihood linear regression (CMLLR). This is only straightforward if reconstruction is used prior to CMLLR. In this work, reconstruction is modified so that we can estimate CMLLR transformations prior to reconstruction. The new approach is evaluated on large vocabulary speech data recorded in noisy public and car environments and compared to using reconstruction prior to CMLLR estimation. The results suggest the noise environment determines which approach is better. Using adaptation prior to reconstruction has the better performance when evaluated on data from public environments. The relative reductions in letter error rate were 47-50 % compared to the baseline and 13-19 % compared to using either adaptation or reconstruction alone.

0 Replies