Language-Aware and Language-Agnostic Multilingual Speech Recognition with a Single Model

Published: 01 Jan 2025, Last Modified: 23 Jun 2025ICPRAM 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, there has been increasing interest in multilingual speech recognition systems, where a single model can transcribe speech in multiple languages. Additional benefit of multilingual learning is that it allows for cross-lingual transfer, often leading to better performance, especially in low-resource languages. On the other hand, multilingual models suffer from errors caused by confusion between languages. This problem can be mitigated by providing the information about language identity as an additional input to the model. In this research, we carry out experiments using a modern state-of-the-art ASR system architecture based on a pretrained multilingual wav2vec 2.0 model and adapter modules trained for the downstream task, and confirm that multilingual supervised learning with language identifiers is a viable method for improving the system’s overall performance. Furthermore, we find that training with language identifiers still yields a model with better average perf
Loading