Abstract: Developing robust automatic speech recognition (ASR) systems for Arabic requires effective strategies to manage its diversity. Existing ASR systems mainly cover the modern standard Arabic (MSA) variety and few high-resource dialects, but fall short in coverage and generalization across the multitude of spoken variants. Code-switching with English and French is also common in different regions of the Arab world, which challenges the performance of monolingual Arabic models. In this work, we introduce a suite of ASR models optimized to effectively recognize multiple variants of spoken Arabic, including MSA, various dialects, and code-switching. We provide open-source pre-trained models that cover data from 17 Arabic-speaking countries, and fine-tuned MSA and dialectal ASR models that include at least 11 variants, as well as multi-lingual ASR models covering embedded languages in code-swtiched utterances. Our open-source/open-weights models achieve the highest coverage and generalization for spoken Arabic and SOTA performance in all Arabic ASR benchamrks.
Paper Type: Long
Research Area: Special Theme (conference specific)
Research Area Keywords: dialect coverage, code-switching, multilingual, speech recognition, pre-training
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models
Languages Studied: Arabic
Submission Number: 6427
Loading