Afri-MCQA: Multimodal Cultural Question Answering for African Languages

ACL ARR 2026 January Submission5050 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal VQA, Cultural QA, African Languages, Low-resource languages
Abstract: Africa is home to over one-third of the world's languages, yet remains severely underrepresented in multimodal AI research. We introduce Afri-MCQA, the first Multilingual Cultural Question-Answering benchmark containing 7.5k Q&A pairs across 15 African languages from 12 countries. The benchmark offers parallel text and speech modalities and was entirely created by native speakers. We find that models show poor performance across evaluated cultures, with near-zero accuracy on open-ended VQA when queried through native language or speech. To test linguistic competence, we include control experiments meant to assess this specific aspect separate from cultural knowledge, and we observe significant performance gaps between native languages and English for both text and speech. These findings underscore the pressing need for speech-first approaches, culturally grounded pretraining, and cross-lingual cultural transfer. We release Afri-MCQA to support more inclusive multimodal AI development.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Multimodal VQA, Cultural QA, African Languages, Low-resource languages
Contribution Types: Data resources, Data analysis
Languages Studied: Akan/Twi, Amharic ,Chichewa ,Hausa, Igbo ,Kikuyu ,Kinyarwanda ,Luganda ,Oromo ,Setswana ,Somali ,Tigrinya ,Yoruba ,Sesotho ,Zulu
Submission Number: 5050
Loading