Keywords: Large Language Model, Model Merging, Multitasking
Abstract: Fine-tuning large language models (LLMs) on task-specific data provides strong in-domain performance but limits generalization and requires storage of many specialized models. Retraining a unified multitask model is often infeasible, as it demands task-specific training data that may be unavailable, raise privacy concerns, or incur prohibitive computational costs. Model merging has been proposed as an alternative solution that effectively integrates the distinct strengths of several fine-tuned models into a single, comprehensive model. The majority of model merging approaches rely on performing arithmetic operations directly on model parameters. Although research in model merging has expanded significantly in recent years, two distinct approaches have become dominant: 1) techniques that mitigate interference from redundant parameters and sign conflicts, and 2) techniques that account for the varying sensitivity of individual parameters. However, these two approaches operate independently without considering each other's strengths and remain disconnected from each other. In this work, we aim to unify these two well-established yet currently disconnected approaches by integrating insights from both the approaches.
We propose DRIFT-MEDIAN, a unified framework for merging models that leverages Fisher information to assign appropriate weights to the task vectors.
Our contribution lies in the development of a closed-form solution of loss function grounded in the Fisher-weighted median. The formulation ensures that parameter contributions reflect both sensitivity and relevance, leading to more robust model merging.
This mechanism prioritizes parameters with high task-specific sensitivity in the merged representation, while naturally diminishing the influence of less important parameters.
Comprehensive experiments on Llama-3.1-8B, Llama-3.2-3B, Llama-2-7b, GPT-2, CLIP-ViT-B/32 models across mathematics, coding, multilingual reasoning, safety, instruction following, GLUE benchmark and vision tasks demonstrate that DRIFT-MEDIAN outperforms existing model merging methods.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 8987
Loading