TL;DR: The very first Manchu ASR model, ManWav1.0, shows significant performance improvement when trained on augmented data compared to original data.
Abstract: This study addresses the widening gap in Automatic Speech Recognition (ASR) research between high resource and low resource languages, with a particular focus on Manchu, a severely underrepresented language. As an extremely low resource language, Manchu exemplifies the challenges faced by marginalized linguistic communities in accessing state-of- the-art technologies. In a pioneering effort, we introduce the first-ever Manchu ASR model, leveraging Wav2Vec 2.0 - XLSR. This groundbreaking development demonstrates the adaptability of advanced ASR models to bridge the gap for low resource languages. The results of the first Manchu ASR is promising, especially when our data augmentation method is employed. Wav2Vec 2.0 - XLSR fine-tuned with augmented data demonstrates a 2%p drop in CER and 13%p drop in WER compared to the same model fine-tuned with original data. This advancement not only marks a significant step in ASR research but also incorporates linguistic diversity into technological innovation.
Paper Type: short
Research Area: Speech recognition, text-to-speech and spoken language understanding
Contribution Types: Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources
Languages Studied: Manchu
0 Replies
Loading