Abstract: Existing multilingual vision-language (VL) benchmarks often only cover a handful of languages. Consequently, evaluations of large vision-language models (LVLMs) predominantly target high-resource languages, underscoring the need for evaluation data for low-resource languages. To address this limitation, we introduce MVL-SIB, a massively multilingual vision-language benchmark that evaluates both cross-modal and text-only topical matching across 205 languages --- over 100 more than the most multilingual existing VL benchmarks encompass. We then benchmark a range of of open-weight LVLMs together with GPT-4o(-mini) on MVL-SIB. Our results reveal that LVLMs struggle in cross-modal topic matching in lower-resource languages, performing no better than chance on languages like N'Koo. Our analysis further reveals that VL support in LVLMs declines disproportionately relative to textual support for lower-resource languages, as evidenced by comparison of cross-modal and text-only topical matching performance. We further observe that open-weight LVLMs do not benefit from representing a topic with more than one image, suggesting that these models are not yet fully effective at handling multi-image tasks.
By correlating performance on MVL-SIB with other multilingual VL benchmarks, we highlight that MVL-SIB serves as a comprehensive probe of multilingual VL understanding in LVLMs.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: vision question answering,multilingual benchmarks,multilingual evaluation
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: Acehnese (Arabic script),Acehnese (Latin script),Mesopotamian Arabic,Ta’izzi-Adeni Arabic,Tunisian Arabic,Afrikaans,South Levantine Arabic,Akan,Amharic,North Levantine Arabic,Modern Standard Arabic,Modern Standard Arabic (Romanized),Najdi Arabic,Moroccan Arabic,Egyptian Arabic,Assamese,Asturian,Awadhi,Central Aymara,South Azerbaijani,North Azerbaijani,Bashkir,Bambara,Balinese,Belarusian,Bemba,Bengali,Bhojpuri,Banjar (Arabic script),Banjar (Latin script),Standard Tibetan,Bosnian,Buginese,Bulgarian,Catalan,Cebuano,Czech,Chokwe,Central Kurdish,Crimean Tatar,Welsh,Danish,German,Southwestern Dinka,Dyula,Dzongkha,Greek,English,Esperanto,Estonian,Basque,Ewe,Faroese,Fijian,Finnish,Fon,French,Friulian,Nigerian Fulfulde,Scottish Gaelic,Irish,Galician,Guarani,Gujarati,Haitian Creole,Hausa,Hebrew,Hindi,Chhattisgarhi,Croatian,Hungarian,Armenian,Igbo,Ilocano,Indonesian,Icelandic,Italian,Javanese,Japanese,Kabyle,Jingpho,Kamba,Kannada,Kashmiri (Arabic script),Kashmiri (Devanagari script),Georgian,Central Kanuri (Arabic script),Central Kanuri (Latin script),Kazakh,Kabiyè,Kabuverdianu,Khmer,Kikuyu,Kinyarwanda,Kyrgyz,Kimbundu,Northern Kurdish,Kikongo,Korean,Lao,Ligurian,Limburgish,Lingala,Lithuanian,Lombard,Latgalian,Luxembourgish,Luba-Kasai,Ganda,Luo,Mizo,Standard Latvian,Magahi,Maithili,Malayalam,Marathi,Minangkabau (Arabic script),Minangkabau (Latin script),Macedonian,Plateau Malagasy,Maltese,Meitei (Bengali script),Halh Mongolian,Mossi,Maori,Burmese,Dutch,N'Koo,Norwegian Nynorsk,Norwegian Bokmål,Nepali,Northern Sotho,Nuer,Nyanja,Occitan,West Central Oromo,Odia,Pangasinan,Eastern Panjabi,Papiamento,Western Persian,Polish,Portuguese,Dari,Southern Pashto,Ayacucho Quechua,Romanian,Rundi,Russian,Sango,Sanskrit,Santali,Sicilian,Shan,Sinhala,Slovak,Slovenian,Samoan,Shona,Sindhi,Somali,Southern Sotho,Spanish,Tosk Albanian,Sardinian,Serbian,Swati,Sundanese,Swedish,Swahili,Silesian,Tamil,Tatar,Telugu,Tajik,Tagalog,Thai,Tigrinya,Tamasheq (Latin script),Tamasheq (Tifinagh script),Tok Pisin,Tswana,Tsonga,Turkmen,Tumbuka,Turkish,Twi,Central Atlas Tamazight,Uyghur,Ukrainian,Umbundu,Urdu,Northern Uzbek,Venetian,Vietnamese,Waray,Wolof,Xhosa,Eastern Yiddish,Yoruba,Yue Chinese,Chinese (Simplified),Chinese (Traditional),Standard Malay,Zulu
Submission Number: 7759
Loading