A Study on Regularization-Based Continual Learning Methods for Indic ASR

A Study on Regularization-Based Continual Learning Methods for Indic ASR

ACL ARR 2025 May Submission4234 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: India's linguistic diversity challenges inclusive Automatic Speech Recognition (ASR) system development. Traditional multilingual models, requiring simultaneous access to all language data, are impractical due to sequential data arrival and privacy constraints. Continual Learning (CL) enables models to learn new languages sequentially without catastrophically forgetting prior knowledge. This paper investigates CL for ASR on Indian languages using the subset of the IndicSUPERB benchmark. We employ a Conformer-based hybrid RNNT-CTC model, initially pretrained on Hindi, which is subsequently trained incrementally on eight additional Indian languages, for a sequence of nine languages in total. We evaluate three prominent regularization and distillation-based CL strategies: Elastic Weight Consolidation, Memory Aware Synapses, and Learning without Forgetting, chosen for their suitability in no-replay, privacy-conscious scenarios. Performance is analyzed using Word Error Rate (WER) for both RNNT and CTC paths on clean/noisy data, and knowledge retention via Backward Transfer. We explore varying training epochs (1, 2, 5 and 10) per task. Results, compared against naive fine-tuning, demonstrate CL's efficacy in mitigating forgetting for scalable ASR in diverse Indian languages under realistic constraints. The code is available at \href{https://anonymous.4open.science/r/Indic-CL-ASR-9FF7}{https://anonymous.4open.science/r/Indic-CL-ASR-9FF7}

Paper Type: Short

Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding

Research Area Keywords: automatic speech recognition, continual learning, security and privacy, robustness, transfer, NLP in resource-constrained settings

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Approaches low compute settings-efficiency, Position papers

Languages Studied: Hindi, Bengali, Marathi, Telugu, Tamil, Urdu, Gujarati, Kannada, Odia

Submission Number: 4234

Loading