From Child Adaptation to Adult Retention: Merging Specialized Arabic–English ASR Models Across Architectures

From Child Adaptation to Adult Retention: Merging Specialized Arabic–English ASR Models Across Architectures

ACL ARR 2026 January Submission10615 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Automatic speech recognition, children’s speech, model merging, catastrophic forgetting, Adult Children ASR adaptation

Abstract: Automatic speech recognition (ASR) systems exhibit persistent performance disparities across age groups and speaker nativity, with children’s speech remaining a systematically underrepresented and challenging domain, even in high-resource languages. Existing adaptation strategies predominantly rely on fine-tuning, which often induces catastrophic forgetting and degrades performance on adult speech; these limitations are further amplified in bilingual children’s ASR, where robust cross-language generalization is required. In this work, we explore weight-space model merging as a principled framework for age-robust and language-inclusive speech modeling. Starting from a shared multilingual base model, we fine-tune complementary child-adapted checkpoints and merge them using balanced weighting to preserve adult representations while incorporating age- and language-specific adaptations. Across all benchmarks, model merging consistently improves recognition accuracy for children while retaining or in some cases improving adult performance, outperforming fine-tuning and joint training baselines. These results demonstrate that model merging provides a scalable and data-efficient alternative to fine-tuning for inclusive ASR across age groups, speaker nativity (L1 and L2 Arabic speakers), and languages (Arabic and English).

Paper Type: Long

Research Area: Speech Processing and Spoken Language Understanding

Research Area Keywords: Automatic speech recognition, children’s speech, model merging, catastrophic forgetting, multilingual adaptation

Contribution Types: Approaches to low-resource settings, Data resources

Languages Studied: Arabic, English

Submission Number: 10615

Loading