Model Merging using Geometric Median of Task Vectors

Siddharth Gupta; Aakash Gupta

Model Merging using Geometric Median of Task Vectors

Siddharth Gupta, Aakash Gupta

Published: 12 Dec 2024, Last Modified: 12 Dec 2024LMC 2024 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Merging, Model Merging, Geometric Median, Task Vector, Weiszfeld's Algorithm

TL;DR: This paper presents an efficient model merging method for large language models (LLMs), where multiple fine-tuned models are combined using a geometric median approach to create a unified model capable of handling diverse tasks.

Abstract: Training high-performing large language models (LLMs) from scratch is an expensive and complex task. Model merging techniques offer a more computationally efficient alternative, where pretrained LLMs are fine-tuned on specific tasks and then combined to produce a versatile model capable of handling a broad range of tasks, including reasoning, coding, mathematics, conversation, and tool usage. Unlike traditional fine-tuning or ensemble methods, our approach to merging is less computationally intensive. We represent each fine-tuned model with a "Task Vector" relative to a pretrained "Base LLM" , derived from the LoRA (Low Rank Adaptation) weights of the fine-tuned models. By computing the geometric median of these task vectors in high-dimensional space using Weiszfeld’s iterative algorithm and adding it to the "Base LLM" weights, we create a unified model that generalizes effectively across tasks. This efficient method achieves state-of-the-art performance on benchmark tests while reducing computational demands. Code available at https://github.com/iMmOrTaL2121/geometric_median_llm_merging.git

Submission Number: 8

Loading