Language diarization for conversational code-switch speech with pronunciation dictionary adaptationDownload PDFOpen Website

Published: 2013, Last Modified: 15 May 2023ChinaSIP 2013Readers: Everyone
Abstract: Language diarization is the task to perform automatic language segmentation and recognition in a code-switch speech. Towards this task, we developed a conversational Mandarin-English code-switch corpus spoken by Singaporean/Malaysian speakers. We also developed a Singapore accent specific pronunciation dictionary, with which we built a Singapore accent phone recognizer to extract long term context phonotactic feature. Our experiment shows that accent-specific phone recognizer is essential to improve language diarization performance. Specifically, the language diarization experiment, the phonotactic features generated by the Singapore accent phone recognizer has a 6.5% relative frame error rate reduction over the phone recognizer using the CMU dictionary. In addition, the ASR performance using this dictionary on the Singapore English corpus achieved 21% relative word error rate reduction over the system using the American accent CMU dictionary.
0 Replies

Loading