Abstract: As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial. In this work, we investigate how multilinguality during instruction tuning of a multilingual LLM affects instruction-following across languages from the pre-training corpus. We first show that many languages transfer some instruction-following capabilities to other languages from even monolingual tuning. Furthermore, we find that only 40 multilingual examples integrated in an English tuning set substantially improve multilingual instruction-following, both in seen and unseen languages during tuning. In general, we observe that models tuned on multilingual mixtures exhibit comparable or superior performance in multiple languages compared to monolingually tuned models, despite training on 10x fewer examples in those languages. Finally, we find that diversifying the instruction tuning set with even just 2-4 languages significantly improves cross-lingual generalization. Our results suggest that building massively multilingual instruction-tuned models can be done with only a very small set of multilingual instruction-responses.
Paper Type: long
Research Area: Multilinguality and Language Diversity
Contribution Types: NLP engineering experiment
Languages Studied: Arabic, Chinese, Czech, English, Estonian, Finnish, Hebrew, Hindi, Italian, Russian, Spanish, Swahili
Preprint Status: There is a non-anonymous preprint (URL specified in the next question).
A1: yes
A1 Elaboration For Yes Or No: Section 7
A2: no
A2 Elaboration For Yes Or No: We do not identify risks that our work adds on to the current risks of LLMs
A3: yes
A3 Elaboration For Yes Or No: Section 1
B: yes
B1: yes
B1 Elaboration For Yes Or No: Section 2,3,4
B2: no
B2 Elaboration For Yes Or No: We use datasets that are publicly available for scientific purposes, and do not release artifacts.
B3: no
B3 Elaboration For Yes Or No: We use the publicly available academic datasets consistently with their intended use.
B4: no
B4 Elaboration For Yes Or No: We use the publicly available academic datasets that are being widely used by the research community.
B5: no
B5 Elaboration For Yes Or No: We use the publicly available academic datasets that are being widely used by the research community.
B6: yes
C: yes
C1: no
C1 Elaboration For Yes Or No: We are not allowed to disclose the exact number of parameters the model has.
C2: yes
C3: yes
C4: n/a
D: yes
D1: yes
D2: no
D2 Elaboration For Yes Or No: The annotators are colleges (engineers) that volunteered to help with our study
D3: no
D3 Elaboration For Yes Or No: The annotators are colleges (engineers) that volunteered to help with our study
D4: no
D4 Elaboration For Yes Or No: The annotators are colleges (engineers) that volunteered to help with our study, and the data collected was only their preferences regarding different model responses to prompts that are used by the research community for models evaluation this way.
D5: no
D5 Elaboration For Yes Or No: The annotators are colleges (engineers) that volunteered to help with our study
E: yes
E1: no
E1 Elaboration For Yes Or No: We used AI assistants for technical help with our plots
0 Replies
Loading