How Multilingual is Multilingual LLM?Download PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: Investigate the multilingual capacities of Large Language Model
Abstract: Large Language Models (LLMs), trained predominantly on extensive English data, often exhibit limitations when applied to other languages. Current research is primarily focused on enhancing the multilingual capabilities of these models by employing various tuning strategies. Despite their effectiveness in certain languages, the understanding of the multilingual abilities of LLMs remains incomplete. This study endeavors to evaluate the multilingual capacity of LLMs by conducting an exhaustive analysis across 101 languages, and classifies languages with similar characteristics into four distinct quadrants. By delving into each quadrant, we shed light on the rationale behind their categorization and offer actionable guidelines for tuning these languages. Extensive experiments reveal that existing LLMs possess multilingual capabilities that surpass our expectations, and we can significantly improve the multilingual performance of LLMs by focusing on these attributes of each quadrant~\footnote{We will release the model and code to the public.}.
Paper Type: long
Research Area: Multilinguality and Language Diversity
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings
Languages Studied: af,am,ar,hy,as,ast,az,be,bn,bs,bg,my,ca,ceb,zh,zhtrad,hr,cs,da,nl,en,et,tl,fi,fr,ff,gl,lg,ka,de,el,gu,ha,he,hi,hu,is,ig,id,ga,it,ja,jv,kea,kam,kn,kk,km,ko,ky,lo,lv,ln,lt,luo,lb,mk,ms,ml,mt,mi,mr,mn,ne,ns,no,ny,oc,or,om,ps,fa,pl,pt,pa,ro,ru,sr,sn,sd,sk,sl,so,ku,es,sw,sv,tg,ta,te,th,tr,uk,umb,ur,uz,vi,cy,wo,xh,yo,zu
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview