Evaluating Speech Foundation Models for Automatic Speech Recognition in the Low-Resource Kanyen'kéha Language
Abstract: Despite recent progress in automatic speech recognition (ASR) and speech foundation models (SFMs) for widely spoken languages, their application to low-resource Indigenous languages remains limited. To this end, this paper presents a systematic evaluation of SFMs for ASR development in Kanyen'kéha, a polysynthetic Iroquoian language structurally and typologically distinct from mainstream languages. To address challenges posed by limited data and extensive vocabulary variation, we further investigate the impact of incorporating in-domain synthesized data and external language models during cross-lingual transfer learning. Experiments on the low-resource Kanyen'kéha corpus, under various train/test splits, show that the best system obtains a WER of 13.73% and a CER of 2.21% on the test set with a 59.2% OOV rate. Excluding easily correctable errors further reduces the WER and CER to 10.36% and 1.76%, demonstrating its potential to support language documentation and revitalization.
External IDs:dblp:conf/interspeech/GengLPJBGCKTLMJ25
Loading