Abstract: Recent advances in large multilingual language models have enabled impressive zero-shot cross-lingual transfer capabilities. However, their performance remains uneven across languages, with underrepresented languages often lagging behind. While prior research has explored typological similarities to explain performance disparities, it has largely ignored external factors such as resource availability and has primarily focused on encoder-only models. In this study, we investigate the interplay between typological features and resource-related factors in zero-shot reading comprehension tasks using decoder-only models. Specifically, we evaluate the performance of GPT-3.5 Turbo, Qwen-2.5B, and Aya-Expanse on the Belebele benchmark. We conduct a series of correlation and regression analyses to examine: (1) the influence of English similarity on transfer performance, (2) whether resource availability acts as a spurious or explanatory factor, and (3) which typological features most significantly predict multilingual model performance. Our findings offer deeper insight into the factors that drive cross-lingual generalization, with implications for improving model equity across languages.
Paper Type: Short
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: typology, analysis, cross-lingual transfer, belebele, reading comprehension
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: Afar, Amharic, Arabic, Assamese, Azerbaijani, Bambara, Bashkir, Basque, Belarusian, Bengali, Bhojpuri, Bosnian, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Estonian, Ewe, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Igbo, Ilocano, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Kinyarwanda, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latvian, Lithuanian, Luganda, Luxembourgish, Macedonian, Maithili, Malagasy, Malay, Malayalam, Maltese, Marathi, Meiteilon (Manipuri), Mongolian, Nepali, Northern Sotho, Norwegian, Nyanja, Odia, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Quechua, Romanian, Russian, Samoan, Scots Gaelic, Serbian, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tigrinya, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Western Frisian, Xhosa, Yiddish, Yoruba, Zulu
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 3
B2 Discuss The License For Artifacts: No
B2 Elaboration: All resources used are under a Public use license
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: Section 3
B4 Data Contains Personally Identifying Info Or Offensive Content: N/A
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Section 3
B6 Statistics For Data: No
B6 Elaboration: We referenced the original dataset paper. We did not have enough space in a short paper to included the statistics.
C Computational Experiments: Yes
C1 Model Size And Budget: No
C1 Elaboration: We used a standard library for inference-only so the computational budget and requirements are minimal.
C2 Experimental Setup And Hyperparameters: N/A
C2 Elaboration: Inference-only
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 4
C4 Parameters For Packages: Yes
C4 Elaboration: Section 3
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: No
E1 Information About Use Of Ai Assistants: N/A
Author Submission Checklist: yes
Submission Number: 328
Loading