Real-time ASR Customization via Hypotheses Re-ordering: A Comparative Study of Different Scoring Functions
Abstract: General purpose automatic speech recognizers (ASRs) require customization to the domain and context, to achieve practically acceptable accuracy levels when used as part of voice digital assistants. Further, such general purpose ASRs typically output multiple alternative hypotheses for the same input utterance. In this paper, we consider the hypothesis re-ordering framework and evaluate the impact of three different scoring functions for re-ordering the hypotheses: phoneme-based, character-based and word-based, and determine their strengths and weaknesses. Based on our intuitions and experimental validation, we determine that phoneme-based scoring is the best for closed domain contexts, while character-based and word-based scoring do better in case of more open-domain contexts. Our results show that character-based scoring gives the best performance improvement in terms of word error rate over general purpose ASRs for voice assistants used in a classroom context. Our analysis also reveals that character-based scoring is preferred for shorter utterances while word-based scoring is preferred for longer utterances.
Paper Type: short
0 Replies
Loading