Text-Derived Language Identity Incorporation for End-to-End Code-Switching Speech Recognition

Published: 11 Oct 2023, Last Modified: 17 Oct 2023EMNLP 2023 Workshop CALCSEveryoneRevisionsBibTeX
Keywords: code-switching; automatic speech recognition; language identification; language model; transformer
TL;DR: We introduce a novel approach to learn language identity from pure text data via a dedicated language identity-language model and explore two strategies to integrate the text-derived language identities into an end-to-end ASR system.
Abstract: Recognizing code-switching (CS) speech often presents challenges for an automatic speech recognition system (ASR) due to limited linguistic context in short monolingual segments, resulting in language confusion. To mitigate this issue, language identity (LID) is often integrated into the speech recognition system to provide additional linguistic context. However, previous works predominately focus on extracting language identity from speech signals. We introduce a novel approach to learn language identity from pure text data via a dedicated language identity-language model. Besides, we explore two strategies: LID state fusion and language posterior biasing, to integrate the text-derived language identities into the end-to-end ASR system. By incorporating hypothesized language identities, our ASR system gains crucial contextual cues, effectively capturing language transitions and patterns within code-switched utterances. We conduct speech recognition experiments on the SEAME corpus and demonstrate the effectiveness of our proposed methods. Our results reveal significantly improved transcriptions in code-switching scenarios, underscoring the potential of text-derived LID in enhancing code-switching speech recognition.
Submission Type: Regular Long Paper(8 pages)
Submission Number: 5
Loading