AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR

ACL ARR 2025 July Submission1377 Authors

29 Jul 2025 (modified: 20 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent advances in speech‑enabled AI, including Google's NotebookLM and OpenAI's speech-to-speech API, are driving widespread interest in voice interfaces across sectors such as finance, health, agritech, legal services, and call‑centers in the global north and south. Despite this momentum, there exists no publicly available application-specific model evaluation that caters to Africa's linguistic diversity. We present **Afrispeech‑MultiBench**, the first domain‑specific evaluation suite for over 100 African English accents across 10+ countries and six application domains: Finance, Legal, Medical, General dialogue, Call Center, and Named Entities. We benchmark a diverse range of open, closed, unimodal ASR and multimodal LLM-based speech recognition systems using both scripted and unscripted conversation drawn from various open African accented English speech datasets. Our empirical analysis reveals systematic variation: open‑source ASR excels in scripted contexts but degrades on noisy, non‑native dialogue; multimodal LLMs are more accent‑robust yet struggle with domain‑specific named entities; proprietary models deliver high accuracy on clean speech but vary significantly by country and domain. Smaller models fine‑tuned on African English achieve competitive accuracy in health and on named entities, a practical advantage for localized deployments. By releasing this benchmark, we empower practitioners and researchers to select voice technologies suited to African use‑cases, fostering inclusive voice applications for underserved communities.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation, NLP Applications, Speech Recognition, Text-to-Speech and Spoken Language Understanding
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings
Languages Studied: English
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: 3
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: 3
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: 3
B4 Data Contains Personally Identifying Info Or Offensive Content: N/A
B4 Elaboration: 3
B5 Documentation Of Artifacts: Yes
B5 Elaboration: 3
B6 Statistics For Data: Yes
B6 Elaboration: 3
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: 4
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: 4
C3 Descriptive Statistics: Yes
C3 Elaboration: 4
C4 Parameters For Packages: N/A
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: No
E1 Information About Use Of Ai Assistants: N/A
Author Submission Checklist: yes
Submission Number: 1377
Loading