Abstract: Low-resource ASR remains a challenging problem, especially for languages like Arabic that exhibit wide dialectal variation and limited labeled data. We propose context-aware prompting strategies to adapt OpenAI's Whisper for Arabic speech recognition without retraining. Our methods include decoder prompting with first-pass transcriptions or retrieved utterances, and encoder prefixing using speech synthesized in the target speaker’s voice. We introduce techniques such as prompt reordering, speaker-aware prefix synthesis, and modality-specific retrieval (lexical, semantic, acoustic) to improve transcription in real-world, zero-shot settings. Evaluated on nine Arabic linguistic conditions, our approach reduces WER by up to 22.3% on Modern Standard Arabic and 9.2% on dialectal speech, significantly mitigating hallucinations and speaker mismatch.
Paper Type: Long
Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding
Research Area Keywords: automatic speech recognition, speech technologies, speech recognition, NLP in resource-constrained settings, model bias/unfairness mitigation, multilingualism, domain adaptation
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data analysis
Languages Studied: Arabic (standard Arabic, Dialects, Accented Arabic)
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: 1, 2, 3
B2 Discuss The License For Artifacts: N/A
B3 Artifact Use Consistent With Intended Use: N/A
B4 Data Contains Personally Identifying Info Or Offensive Content: N/A
B5 Documentation Of Artifacts: Yes
B5 Elaboration: 3
B6 Statistics For Data: Yes
B6 Elaboration: 4.1
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: A.2
C2 Experimental Setup And Hyperparameters: N/A
C3 Descriptive Statistics: Yes
C3 Elaboration: 4, 5
C4 Parameters For Packages: Yes
C4 Elaboration: A2
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: No
E1 Elaboration: We just used AI assistans for paraphrasing as English is not our first langauge.
Author Submission Checklist: yes
Submission Number: 863
Loading