Abstract: With the increased use of speech recognition outside of the lab environment, the need for better out-of-vocabulary (OOV) rejection techniques is critical for the continued success of this user interface. Not only must future speech recognition systems accurately reject OOV utterances, but they must also maintain their performance in mismatched (i.e. noisy) conditions. In this paper, we extend our work on low-complexity, high-accuracy speaker independent speech recognition. We present a novel rejection criterion that is shown to be robust in mismatched conditions. This technique continues our emphasis on speech recognition for resource limited applications, by providing a solution that is highly scalable, requiring no additional memory and no significant increase in computation. The technique is based on the use of multiple garbage models (on the order of 100 or more) and a novel ranking method to achieve robust performance. This method allows for a data dependent approach in order to optimize the performance over each class individually. Results are presented for a large database consisting of 166 speakers and 131 classes. Out-of-class rejection is based on 118 out-of-vocabulary phrases and 3 categories of spurious inputs (breath noise, coughs, and lipsmack). Performance is shown to be superior to the approximated optimal Bayes reject rule.
Loading