Abstract: We propose a rescoring framework for speech recognition that incorporates acoustic phonetic knowledge sources. The scores corresponding to all knowledge sources are generated from a collection of neural network based classifiers. Rescoring is then performed by combining different knowledge scores and they are used to reorder candidate strings provided by state-of-the-art HMM-based speech recognizers. We report on continuous phone recognition experiments using the TIMIT database. Our results indicate that classifying manners and places of articulation provides additional information in rescoring, and improved accuracies over our best baseline speech recognizers are achieved using both context-independent and context-dependent phone models. The same technique can be extended to lattice rescoring and large vocabulary continuous speech recognition.
0 Replies
Loading