MIMA: Iterative Model Averaging and Fine-Tuning for Multi-Task Learning

ICLR 2026 Conference Submission18631 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Transfer Learning, Multi-Task Learning, Model Merging
Abstract: Fine-tuning large, pre-trained models on downstream tasks has become standard practice. But multi-task models that combine isolated task-specialised models remain challenging to construct. Task Arithmetic, a recent approach, merges multiple task-specific models into a single multi-task network simply by adding their ``task vectors'', without revisiting the original training data. In practice, model merging often results in substantial performance degradation. We show that independent fine-tuning of each model pushes these task vectors in orthogonal directions in parameter space. We hypothesise that actively aligning task vectors during fine-tuning will improve the performance of merged models. To test this hypothesis, we propose an iterative model averaging and fine-tuning framework called \textbf{MIMA}, which stands for \textbf{M}ulti-Task \textbf{I}terated \textbf{M}odel \textbf{A}veraging. We demonstrate that alternating phases of weight averaging and fine-tuning increase the pairwise cosine similarity between task vectors, encouraging knowledge sharing between tasks and preventing any one task vector from drifting too far from a unified model representation. When evaluated on a suite of eight vision benchmark tasks, MIMA retains competitive performance for each fine-tuned model on its single task, and significantly reduces the single-task accuracy gap between the fine-tuned model and the merged model to nearly zero, indicating the complete alignment between task vectors. Our work reveals new insights into the geometric relationship of the task vector in Task Arithmetic and presents a more effective framework for editing the behaviour of pre-trained models towards multi-task learning.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 18631
Loading