MSfusion: Enabling Collaborative Training of Large Models over Resource-Constraint Participants

Jin Xie; Songze Li

MSfusion: Enabling Collaborative Training of Large Models over Resource-Constraint Participants

Jin Xie, Songze Li

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Collaborative Learning, Large Models, Model Splitting, Contrastive Learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We propose an effective and efficient collaborative learning framework based on model splitting, for training large models on resource-constraint devices.

Abstract: Training large models like GPT-3 requires a large amount of data, as well as abundant computation resources. While collaborative learning (e.g., federated learning) provides a promising paradigm to harness collective data from many participants, performing training for large models remains a major challenge for participants with limited resources. We introduce MSfusion, an effective and efficient collaborative learning framework, tailored for training large models on resource-constraint devices through model splitting. Specifically, a double shifting model splitting scheme is designed such that in each training round, each participant is assigned a subset of model parameters to train over local data, and aggregates with sub-models of other peers on common parameters. While model splitting significantly reduces the computation and communication costs of individual participants, additional novel designs on adaptive model overlapping and contrastive loss functions help MSfusion to maintain training effectiveness, against model shift across participants. Extensive experiments on image and NLP datasets illustrate significant advantages of MSfusion in performance and efficiency for training large models, and its strong scalability: computation cost of each participant reduces significantly as the number of participants increases.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3368

Loading