Multilingual Performance Analysis of Large Language Models

Multilingual Performance Analysis of Large Language Models

ACL ARR 2024 April Submission359 Authors

15 Apr 2024 (modified: 29 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The training process of Large Language Models (LLMs) requires extensive text corpus. However, these data are often unevenly distributed in different languages. As a result, LLMs perform well on common languages, such as English, German, and French, but perform poorly on low-resource languages. However, currently, there is no work to quantitatively measure the performance of LLMs in low-resource languages. To fill this gap, we proposed the Language Ranker that aims to benchmark and rank different languages according to the performance of LLMs on those languages. We employ the LLM's performance on the English corpus as a baseline to compare the performances of different languages and English. We have the following three findings: 1. The performance rankings of different LLMs in all languages are roughly the same. 2. LLMs with different sizes have the same partial order of performance. 3. There is a strong correlation between LlaMa2's performance in different languages and the proportion of the pre-training corpus. These findings illustrate that the Language Ranker can be used as an indicator to measure the performance of LLMs with different languages.

Paper Type: Short

Research Area: Multilingualism and Cross-Lingual NLP

Research Area Keywords: Multilingual Performance, LLM

Contribution Types: Model analysis & interpretability

Languages Studied: English, high-rersource languages and low-resource languages

Submission Number: 359

Loading