Keywords: LLM Merging, Model Merging, Geometric Median, Task Vector, Weiszfeld's Algorithm
TL;DR: This paper presents an efficient model merging method for large language models (LLMs), where multiple fine-tuned models are combined using a geometric median approach to create a unified model capable of handling diverse tasks.
Abstract: Training high-performing large language models (LLMs) from scratch is an expensive and complex task. Model merging techniques offer a more computationally
efficient alternative, where pretrained LLMs are fine-tuned on specific tasks and
then combined to produce a versatile model capable of handling a broad range
of tasks, including reasoning, coding, mathematics, conversation, and tool usage.
Unlike traditional fine-tuning or ensemble methods, our approach to merging is
less computationally intensive. We represent each fine-tuned model with a "Task
Vector" relative to a pretrained "Base LLM" , derived from the LoRA (Low Rank
Adaptation) weights of the fine-tuned models. By computing the geometric median of these task vectors in high-dimensional space using Weiszfeld’s iterative
algorithm and adding it to the "Base LLM" weights, we create a unified model that
generalizes effectively across tasks. This efficient method achieves state-of-the-art
performance on benchmark tests while reducing computational demands.
Code available at https://github.com/iMmOrTaL2121/geometric_median_llm_merging.git
Submission Number: 8
Loading