Model Breadcrumbs: Crafting Multi-Task Models from Pre-Existing Fine-Tuned Foundation Models

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: pre-trained models, fine-tuning, transfer learning, weight interpolation, merging models, sparse models
TL;DR: Model Breadcrumbs: a novel method for efficiently merging pre-trained foundation models fine-tuned across diverse tasks, enabling improved multi-task learning and facilitating collaborative model updates in the evolving landscape of AI development.
Abstract: The rapid evolution of AI system development has been greatly influenced by the emergence of foundation models. The prevailing approach involves fine-tuning these pre-trained foundation models for specific target tasks, leading to a rapid spread of models fine-tuned across a diverse array of tasks. This paper introduces an innovative strategy termed "Model Breadcrumbs", which addresses the need to merge multiple fine-tunings of the same foundation model across a spectrum of auxiliary tasks. Model Breadcrumbs comprises a sparsely defined set of weights that carve out a trajectory within the weight space of a pre-trained model, which is designed to enhance task performance when traversed. Model Breadcrumbs are constructed by subtracting the weights from a pre-trained model against the same model's weights post fine-tuning, followed by a sparsification process that mitigates weight outliers and negligible perturbations. Our experiments demonstrates the effectiveness of combining Model Breadcrumbs to simultaneously enhance performance across multiple tasks. This contribution aligns with the evolving paradigm of updatable machine learning, reminiscent of the collaborative principles underlying open-source software development, fostering a community-driven effort to reliably update machine learning models. Through extensive experimentation encompassing various models and tasks, we establish that integrating Model Breadcrumbs represents a straightforward, efficient, and highly effective approach for constructing multi-task models and facilitating updates to foundation models.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6423
Loading