Incorporating ASR Errors with Attention-Based, Jointly Trained RNN for Intent Detection and Slot Filling

Abstract: The real-world performance of slot filling and intent detection task generally degrades due to transcription errors generated by speech recognition engine. The insertion, deletion, and mis-recognition errors from speech recognizer's front-end cause the mis-interpretation and mis-alignment of the language understanding models. In this work, we propose a new jointly trained model of intent detection and slot filling with consideration of speech recognition errors. The attention-based encoder-decoder recurrent neural network first decodes the intent information from an utterance, and then corrects errors in the word sequence, if any, before extracting the slot information. The triple joint training framework maximizes the probability of a correct understanding given an input utterance. Our experimental results showed that the proposed model obtained 2.87% absolute gain over the joint model without ASR error correction for slot filling and 0.73% absolute error rate reduction for intent detection.
0 Replies
Loading