AfriVox: Probing Multilingual and Accent Robustness of Speech LLMs

AfriVox: Probing Multilingual and Accent Robustness of Speech LLMs

ACL ARR 2025 July Submission544 Authors

28 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advances in multimodal and speech-native large language models (LLMs) have delivered impressive speech recognition, translation, understanding, and question-answering capabilities for high-resource languages. However, African languages and non-native French or English accents remain dramatically underrepresented in benchmarks limiting the understanding and applicability of leading LLMs for millions of francophone and anglophone users in low-resource settings. We presents AfriVox, an open-source benchmark (including novel domain-specific and unscripted datasets) across 20 African languages, African-accented French, Arabic, and 100+ African English accents, contrasting leading multimodal speech LLMs with traditional unimodal automatic speech transcription (ASR) and translation (AST) models. Our analysis reveals significant language coverage variation, surprising LLM translation performance gains (e.g. Gemini), robustness concerns with unscripted speech, and substantial performance disparities for "supported" African languages. We profile the strengths, limitations, and language support of each model, and conduct the first targeted fine-tuning of a modern speech LLM (Qwen2.5-Omni) for three Nigerian languages, exceeding SOTA, and achieving up to 54% relative WER reduction and significant BLEU gains, offering practical guidance for implementers seeking to serve local language users.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Multilingualism and Cross-Lingual NLP, Resources and Evaluation, Speech Recognition

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: Afrikaans, Akan, Amharic, Arabic, English, French, Ga, Hausa, Igbo, Kinyarwanda, Luganda, Pedi, Sesotho, Shona, Swahili, Tswana, Twi, Xhosa, Yoruba, and Zulu

Previous URL: https://openreview.net/forum?id=MbkHmmqgT1

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).

Reassignment Request Reviewers: Yes, I want a different set of reviewers

Justification For Not Keeping Action Editor Or Reviewers: Previous reviewers were dismissive, without carefully reading the paper, raising concerns that were clearly addressed in the paper, considering the results as "expected", ignoring extensive experiments and analysis, questioning the scientific contribution of this work, dismissing the work without any concrete comments regarding correctness of the results or argumentation, limited perceived impact of the findings

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: 3

B2 Discuss The License For Artifacts: Yes

B2 Elaboration: 1

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: 3

B4 Data Contains Personally Identifying Info Or Offensive Content: Yes

B4 Elaboration: Data contains publicly available parliamentary proceedings which contain names of member of senate and legislative arm of government

B5 Documentation Of Artifacts: Yes

B5 Elaboration: 3

B6 Statistics For Data: Yes

B6 Elaboration: 3

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: 3

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: 3

C3 Descriptive Statistics: Yes

C3 Elaboration: 3

C4 Parameters For Packages: N/A

D Human Subjects Including Annotators: Yes

D1 Instructions Given To Participants: No

D1 Elaboration: Annotator task was transcription of recorded audios

D2 Recruitment And Payment: Yes

D2 Elaboration: 3

D3 Data Consent: Yes

D3 Elaboration: 3

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: No

E1 Information About Use Of Ai Assistants: N/A

Author Submission Checklist: yes

Submission Number: 544

Loading