Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks

Published: 22 Aug 2024, Last Modified: 16 Feb 2025ACL 2024EveryoneRevisionsCC BY 4.0
Abstract: Multimodal large language models (MLLMs) have proven effective in a wide range of tasks that re- quire complex reasoning and linguistic compre- hension. However, due to a lack of high-quality multimodal resources in languages other than En- glish, success of MLLMs remains relatively limited to English-based settings. This poses significant challenges in developing comparable models for other languages, even those with large speaker pop- ulations, such as Arabic. To alleviate this chal- lenge, we introduce a comprehensive family of Arabic MLLMs, dubbed Peacock, with strong vi- sion and language capabilities. Through compre- hensive qualitative and quantitative analysis, we demonstrate the solid performance of our models on various visual reasoning tasks and further show their emerging dialectal potential. Additionally, we introduce Henna, a new benchmark specifically de- signed for assessing MLLMs on aspects related to Arabic culture, setting the first stone for culturally- aware Arabic MLLMs. The GitHub repository for the Peacock project is available at https: //github.com/UBC-NLP/peacock.
Loading