Keywords: model merging, large language models
Abstract: Fine-tuning large language models provides strong in-domain performance but limits generalization and requires storage of many specialized models. Retraining a unified multitask model is often infeasible due to data unavailability or high computational cost. The majority of model merging approaches rely on performing arithmetic operations directly on model parameters.
Although research in model merging has expanded significantly in recent years, two distinct approaches have become dominant: 1) techniques that mitigate interference from redundant parameters and sign conflicts, and 2) techniques that account for the varying sensitivity of individual parameters. However, these two approaches operate independently without considering each other's strengths and remain disconnected from each other. In this work, we aim to unify these two well-established yet currently disconnected approaches by integrating insights from both the approaches.
We propose DRIFT-MEDIAN, a Fisher-aware model merging method that assigns sensitivity-weighted task vectors through a closed-form Fisher-weighted median, ensuring that task-relevant parameters dominate the merged model.
Comprehensive experiments on several LLMs and CLIP models demonstrate that DRIFT-MEDIAN outperforms existing model merging methods.
Paper Type: Long
Research Area: Language Models
Research Area Keywords: model editing , robustness
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 5546
Loading