Abstract: Multimodal large language models (MLLMs) have
proven effective in a wide range of tasks that re-
quire complex reasoning and linguistic compre-
hension. However, due to a lack of high-quality
multimodal resources in languages other than En-
glish, success of MLLMs remains relatively limited
to English-based settings. This poses significant
challenges in developing comparable models for
other languages, even those with large speaker pop-
ulations, such as Arabic. To alleviate this chal-
lenge, we introduce a comprehensive family of
Arabic MLLMs, dubbed Peacock, with strong vi-
sion and language capabilities. Through compre-
hensive qualitative and quantitative analysis, we
demonstrate the solid performance of our models
on various visual reasoning tasks and further show
their emerging dialectal potential. Additionally, we
introduce Henna, a new benchmark specifically de-
signed for assessing MLLMs on aspects related to
Arabic culture, setting the first stone for culturally-
aware Arabic MLLMs. The GitHub repository
for the Peacock project is available at https:
//github.com/UBC-NLP/peacock.
Loading