On Task Vectors and Gradients

Luca Zhou; Daniele Solombrino; Donato Crisostomi; Maria Sofia Bucarelli; Giuseppe Alessio D'Inverno; Fabrizio Silvestri; Emanuele Rodolà

On Task Vectors and Gradients

Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Giuseppe Alessio D'Inverno, Fabrizio Silvestri, Emanuele Rodolà

Published: 23 Sept 2025, Last Modified: 17 Nov 2025UniReps2025EveryoneRevisionsBibTeXCC BY 4.0

Supplementary Material: zip

Track: Proceedings Track

Keywords: multitask learning, gradient, task arithmetic, model merging

TL;DR: We prove task vectors approximate multitask gradients early in training, explaining why summing them merges models effectively, even after just one epoch.

Abstract: Task arithmetic has emerged as a simple yet powerful technique for model merging, enabling the combination of multiple finetuned models into a single model. Despite its empirical success, a clear theoretical understanding of why and when it works has been lacking. This paper provides a rigorous theoretical foundation for task arithmetic by establishing a direct connection between task vectors and gradients of the task losses. We show that under standard gradient descent, a task vector generated from one epoch of finetuning is exactly equivalent to the negative gradient of the loss, scaled by the learning rate. For the practical multi-epoch setting, we prove that this equivalence holds approximately, with a second-order error term that we explicitly bound for feed-forward networks. Our empirical analysis across seven vision benchmarks corroborates our theory, demonstrating that the first-epoch gradient dominates the finetuning trajectory in both norm and direction. A key implication is that merging models finetuned for only a single epoch often yields performance comparable to merging fully converged models. These findings reframe task arithmetic as a form of approximate multitask learning, providing a clear rationale for its effectiveness and highlighting the critical role of early training dynamics in model merging.

Submission Number: 9

Loading