Keywords: Model Merging, Representation Learning, Resolving Interference, Distillation
TL;DR: Lightweight adaptation stratergy to reduce cross-task interference to improve the performance of existing merging methods.
Abstract: Model merging has shown that multitask models can be created by directly combining the parameters of different models that are each specialized on tasks of interest. However, models trained independently on distinct tasks often exhibit interference that degrades the merged model's performance. To solve this problem, we formally define the notion of 'Cross-Task Interference' as the drift in the representation of the merged model to its constituent models. Reducing cross-task interference is the key to improving merging performance. To address this issue, we propose our method 'Resolving Interference (RI)', a light-weight framework which disentangles expert models to be functionally orthogonal to the space of other tasks, thereby reducing cross-task interference. RI does this whilst using only \textit{unlabeled auxiliary} data as input (i.e., no task-data is needed), allowing it to be applied to under data-scarce scenarios. RI consistently improves the performance of existing merging methods by up to 10% and generalization to unseen domains by up to 2.3%. We also find RI to be robust to the source of auxiliary input while being significantly less sensitive to tuning of merging hyperparameters.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 5683
Loading