Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data

Published: 01 Jan 2013, Last Modified: 15 May 2025INTERSPEECH 2013EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper introduces a method for lightly supervised discriminative training using MMI to improve the alignment of speech and text data for use in training HMM-based TTS systems for low-resource languages. In TTS applications, due to the use of long-span contexts, it is important to select training utterances which have wholly correct transcriptions. In a low-resource setting, when using poorly trained grapheme models, we show that the use of MMI discriminative training at the grapheme-level enables us to increase the amount of correctly aligned data by 40%, while maintaining a 7% sentence error rate and 0.8% word error rate. We present the procedure for lightly supervised discriminative training with regard to the objective of minimising sentence error rate.
Loading