Keywords: arabic, dialect, msa, cultural reasoning, dialogue
Abstract: There is a significant gap in evaluating cultural reasoning in LLMs using conversational datasets that capture culturally rich and dialectal contexts. In Arabic NLP, most prior work focuses on Modern Standard Arabic (MSA) and short text snippets, overlooking the cultural nuances that naturally arise in dialogue. To address this gap, we introduce a culturally grounded conversational dataset covering 13 Arabic-speaking countries, including MSA and corresponding dialects, spanning 12 daily-life domains and 54 fine-grained subtopics. We define three tasks: (i) multiple-choice cultural reasoning, (ii) machine translation between MSA and dialects, and (iii) dialect-steering generation. Experiments with open-weight LLMs reveal substantial challenges: models struggle with dialectal data and perform significantly worse on all three tasks compared to MSA, highlighting the need for culturally aware dialogue systems.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: corpus creation, benchmarking, language resources, NLP datasets
Contribution Types: Approaches to low-resource settings, Data resources
Languages Studied: standard arabic (MSA), arabic dialect in Algeria Libya Morocco Tunisia Egypt Sudan Jordan Lebanon Palestine Syria KSA UAE Yemen
Submission Number: 10215
Loading