Afrispeech Semantics: Evaluating Audio–Semantic Reasoning in Spoken Language Models Across Domains and Accents

ACL ARR 2026 January Submission8933 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Speech processing, Audio Language Model. Speech reasoning, Accented speech, Low resource speech
Abstract: Audio language models (ALMs) are increasingly used for speech-based understanding; yet, their ability to perform semantic reasoning beyond transcription, Text-to-Audio Retrieval, Captioning, and Question-Answering accuracy remains insufficiently benchmarked. In particular, the effects of accent variation, domain shift, and semantic over-inference on audio reasoning are poorly understood. We evaluate audio language models across five semantic and paralinguistic reasoning tasks: entailment, consistency, plausibility, accent drift, and accent restraint. Collectively, these tasks assess a model’s ability to reason over spoken audio as the primary evidence source, including whether a textual hypothesis can be inferred, contradicted, or left undetermined by the audio, whether statements align or conflict with spoken content, whether claims are plausible given the discourse, and whether model predictions remain stable or appropriately constrained across accent variation. These findings highlight critical limitations in current audio reasoning evaluations and hope to provide guidance for more robust and equitable ALM design and assessment.
Paper Type: Long
Research Area: Speech Processing and Spoken Language Understanding
Research Area Keywords: Speech Recognition, Text-to-Speech and Spoken Language Understanding
Contribution Types: Model analysis & interpretability, Reproduction study, Approaches to low-resource settings, Approaches low compute settings-efficiency, Data resources, Surveys
Languages Studied: African Accented English
Submission Number: 8933
Loading