Update Your Transformer to the Latest Release: Re-Basin of Task Vectors

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
TL;DR: We propose a data-free method to transfer fine-tuned Transformer model updates to newer pre-trained versions using weight permutations, enabling fine-tuning updating without retraining.
Abstract: Foundation models serve as the backbone for numerous specialized models developed through fine-tuning. However, when the underlying pretrained model is updated or retrained (e.g., on larger and more curated datasets), the fine-tuned model becomes obsolete, losing its utility and requiring retraining. This raises the question: is it possible to transfer fine-tuning to a new release of the model? In this work, we investigate how to transfer fine-tuning to a new checkpoint without having to re-train, in a data-free manner. To do so, we draw principles from model re-basin and provide a recipe based on weight permutations to re-base the modifications made to the original base model, often called task vector. In particular, our approach tailors model re-basin for Transformer models, taking into account the challenges of residual connections and multi-head attention layers. Specifically, we propose a two-level method rooted in spectral theory, initially permuting the attention heads and subsequently adjusting parameters within select pairs of heads. Through extensive experiments on visual and textual tasks, we achieve the seamless transfer of fine-tuned knowledge to new pre-trained backbones without relying on a single training step or datapoint. Code is available at https://github.com/aimagelab/TransFusion.
Lay Summary: In artificial intelligence, foundational models are large systems pre-trained on extensive datasets and often fine-tuned for specific tasks. When these models are updated, fine-tuned versions become outdated and require expensive retraining, which can be impractical if the original training data is inaccessible. We present "TransFusion," a novel approach that enables the seamless transfer of fine-tuning from older Transformer model versions to updated ones without needing additional training or data. By realigning specialized knowledge within the model's parameters using weight permutations, our method effectively "re-bases" the fine-tuning. This two-stage alignment process addresses the complexities of Transformer models, preserving their original functionality while enhancing performance on targeted tasks. Our experimental results confirm that TransFusion successfully transfers specialized knowledge, reducing costs and making advanced AI solutions more accessible for real-world applications.
Link To Code: https://github.com/aimagelab/TransFusion
Primary Area: General Machine Learning->Transfer, Multitask and Meta-learning
Keywords: weight interpolation, transfer learning, model rebasin, model editing, model patching
Submission Number: 7266
Loading