Keywords: LLM, multilingual, model-merging, multitask, lora
TL;DR: This paper leverages merging techniques to combine independently trained language adapters to improve training efficiency
Abstract: Fine-tuning a task-specific multilingual large language model (LLM) involves training the model on a multilingual dataset with examples in all the required languages. Updating one or more supported languages with additional data or adding support for a new language involves retraining the model, which can be computationally inefficient. Recent research on merging multiple task-specific models has shown promise in terms of both computational efficiency and improved performance. These approaches only consider multiple tasks in a single language, but their effectiveness in merging language-specific models trained on a single task is underexplored. In this work, we explore existing model merging approaches in a multilingual setting for three independent tasks. Our experiments show that model merging approaches achieve performance on par with models trained on a combined dataset of multiple languages, as well as the language-specific fine-tuned models. Our analysis indicates that training efficiency improves by reducing the training time of adding or updating new languages by 2.5 times and reducing training costs by 3 times.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 19889
Loading