Improving Information Extraction by Modeling Errors in Speech Recognizer Output

David D. Palmer, Mari Ostendorf

2001 (modified: 16 Jul 2019)HLT 2001Readers: Everyone

Abstract: In this paper we describe a technique for improving the performance of an information extraction system for speech data by explicitly modeling the errors in the recognizer output. The approach combines a statistical model of named entity states with a lattice representation of hypothesized words and errors annotated with recognition confidence scores. Additional refinements include the use of multiple error types, improved confidence estimation, and multipass processing. In combination, these techniques improve named entity recognition performance over a text-based baseline by 28%.

0 Replies