Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation

ACL ARR 2025 February Submission7193 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Language diversity presents a significant challenge in speech-to-text (S2T) tasks, such as automatic speech recognition and translation. Traditional multitask training approaches aim to address this by jointly optimizing multiple speech recognition and translation tasks across various languages. While models like Whisper, built on these strategies, demonstrate strong performance, they still face issues of high computational cost, language interference, suboptimal training configurations, and limited extensibility. To overcome these challenges, we introduce LoRS-Merging (low-rank and sparse model merging), a novel technique designed to efficiently integrate models trained on different languages or tasks while preserving performance and reducing computational overhead. LoRS-Merging combines low-rank and sparse pruning to retain essential structures while eliminating redundant parameters, mitigating language and task interference, and enhancing extensibility. Experimental results across a range of languages demonstrate that LoRS-Merging reduces the word error rate by 10% and improves BLEU scores by 4% compared to conventional multilingual multitask training baselines. Our findings suggest that model merging, particularly LoRS-Merging, is a scalable and effective complement to traditional multilingual training strategies for S2T applications.
Paper Type: Long
Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding
Research Area Keywords: automatic speech recognition, speech translation, speech technologies
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: Catalan, German, Spanish, French, Italian, Indonesian, Dutch, Portuguese, Russian, Swedish
Submission Number: 7193
Loading