How Do Multilingual Language Models Handle Multiple Languages?
Keywords: Natural Language Processing
Abstract: Multilingual language models (MLMs) have advanced rapidly and are now central to natural language processing across diverse languages. However, their behavior remains uneven across high-resource and low-resource settings, and standard downstream benchmarks often fail to explain \emph{why} performance gaps persist. This paper presents a systematic evaluation of how multilingual models, including BLOOM-1.7B and Qwen2, represent and transfer linguistic knowledge across languages. We study three complementary dimensions. First, we evaluate cross-lingual semantic consistency by measuring whether embeddings of semantically equivalent words remain aligned across languages. Second, we probe internal representations using sentence similarity and named entity recognition (NER) to examine how linguistic information is distributed across model layers. Third, we assess cross-lingual transfer using Natural Language Inference (NLI) on XNLI, testing whether knowledge learned from English generalizes to lower-resource languages such as Arabic and Swahili. Our results reveal clear disparities between resource-rich and less-represented languages, with typologically distant languages showing weaker semantic alignment and larger transfer degradation. Layer-wise analyses further show that representational quality is not uniform across depth and that model architectures differ substantially in how well they preserve multilingual information. By combining embedding analysis, probing, quantitative evaluation, and visualization, this study provides a more fine-grained account of multilingual model behavior and highlights practical directions for improving inclusivity and robustness in multilingual NLP.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 61
Loading