Multi-Pass Pronunciation Adaptation

Published: 01 Jan 2007, Last Modified: 15 May 2025ICASSP (4) 2007EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: A mapping between words and pronunciations (potential phonetic realizations) is a key component of speech recognition systems. Traditionally, this has been encoded in a lexicon where each pronunciation is transcribed by a linguist or generated by a grapheme-to-phoneme algorithm. For large vocabulary recognition systems, this process is highly susceptible to errors. We present an off-line data driven algorithm to correct suboptimal pronunciations using transcribed utterances. Unlike previous data driven algorithms that struggle to balance acoustic representation and multi-speaker generalization, our multi-pass approach maximizes both criteria, instead of compromising between the two. We demonstrate on an automated name dialing task that our multi-pass algorithm achieves a 70% error rate reduction when compared to a baseline grapheme-to-phoneme generated lexicon.
Loading