How Multilingual is LLaMA?Download PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Large Language Models (LLMs), often show strong performance on English tasks, while exhibiting limitations on other languages. What is an LLM's multilingual capability when it is trained only on certain languages? The underlying mechanism remains unclear. This study endeavors to examine the multilingual capability of LLMs by conducting an exhaustive analysis across 101 languages. Through the investigation of the performance gap before and after embedding fine-tuning, we discovered four distinct quadrants. By delving into each quadrant we provide actionable and efficient guidelines for tuning these languages. Extensive experiments reveal that existing LLMs possess multilingual capabilities that surpass our expectations, and we can significantly improve the multilingual performance of LLMs based on these attributes of each quadrant~\footnote{We will release the model and code to the public.}.
Paper Type: long
Research Area: Multilinguality and Language Diversity
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings
Languages Studied: af,am,ar,hy,as,ast,az,be,bn,bs,bg,my,ca,ceb,zh,zhtrad,hr,cs,da,nl,en,et,tl,fi,fr,ff,gl,lg,ka,de,el,gu,ha,he,hi,hu,is,ig,id,ga,it,ja,jv,kea,kam,kn,kk,km,ko,ky,lo,lv,ln,lt,luo,lb,mk,ms,ml,mt,mi,mr,mn,ne,ns,no,ny,oc,or,om,ps,fa,pl,pt,pa,ro,ru,sr,sn,sd,sk,sl,so,ku,es,sw,sv,tg,ta,te,th,tr,uk,umb,ur,uz,vi,cy,wo,xh,yo,zu
0 Replies

Loading