MLSS: Mandarin English Code-Switching Speech Recognition via Mutual Learning-Based Semi-Supervised Method

Published: 01 Jan 2025, Last Modified: 23 Jun 2025IEEE Signal Process. Lett. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Code-switching is a phenomenon of alternating use of two or more languages within or between utterances in communication that often occurs in multilingual communities. Recently, code-switching natural language processing and automatic speech recognition (ASR) have attracted numerous studies. However, a major obstacle affecting the results of these studies is the lack of transcribed data. In this letter, we propose a novel semi-supervised learning (SSL) approach to deal with this problem, namely Mutual Learning-Based Semi-Supervised Method (MLSS). The MLSS method involves the utilization of two networks for interleaved fine-tuning on a combination of transcribed dataset and pseudo-labeled data generated from another network. This iterative fine-tuning process repeats until all unlabeled data are selected for training or reaches a certain number of iterations. By incorporating mutual learning between the two networks, our approach effectively leverages the knowledge acquired from previous iterations during the training stage and combines the knowledge from both networks during the decoding process, resulting in a more robust and effective approach. To evaluate the effectiveness of our proposed method, we conduct experiments on the SEAME Mandarin-English code-switching corpus. The experimental results clearly illustrate that our approach outperforms other state-of-the-art methods, as evidenced by achieving a Mixed Error Rate (MER) of 15.6% /21.1% on test$_{man}$/test$_{sge}$ sets.
Loading