Afri-MCQA: Multimodal Cultural Question Answering for African Languages

Afri-MCQA: Multimodal Cultural Question Answering for African Languages

ACL ARR 2026 January Submission5050 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal VQA, Cultural QA, African Languages, Low-resource languages

Abstract: Africa is home to over one-third of the world's languages, yet remains severely underrepresented in multimodal AI research. We introduce Afri-MCQA, the first Multilingual Cultural Question-Answering benchmark containing 7.5k Q&A pairs across 15 African languages from 12 countries. The benchmark offers parallel text and speech modalities and was entirely created by native speakers. We find that models show poor performance across evaluated cultures, with near-zero accuracy on open-ended VQA when queried through native language or speech. To test linguistic competence, we include control experiments meant to assess this specific aspect separate from cultural knowledge, and we observe significant performance gaps between native languages and English for both text and speech. These findings underscore the pressing need for speech-first approaches, culturally grounded pretraining, and cross-lingual cultural transfer. We release Afri-MCQA to support more inclusive multimodal AI development.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Multimodal VQA, Cultural QA, African Languages, Low-resource languages

Contribution Types: Data resources, Data analysis

Languages Studied: Akan/Twi, Amharic ,Chichewa ,Hausa, Igbo ,Kikuyu ,Kinyarwanda ,Luganda ,Oromo ,Setswana ,Somali ,Tigrinya ,Yoruba ,Sesotho ,Zulu

Submission Number: 5050

Loading