MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

Sheng Wang; Liheng Chen; Pengan CHEN; Jingwei Dong; Boyang XUE; Jiyue Jiang; Lingpeng Kong; Chuan Wu

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

Sheng Wang, Liheng Chen, Pengan CHEN, Jingwei Dong, Boyang XUE, Jiyue Jiang, Lingpeng Kong, Chuan Wu

Published: 22 Jan 2025, Last Modified: 15 Feb 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LoRA, parameter efficiency, parameter sharing, instruction tuning, NLP

TL;DR: We introduce a more parameter-efficient finetuning method named MoS, and demonstrate its remarkably higher parameter efficiency and other advantages over peer methods, with the hope of establishing it as a resource-friendly alternative to LoRA.

Abstract: The rapid scaling of large language models necessitates more lightweight finetuning methods to reduce the explosive GPU memory overhead when numerous customized models are served simultaneously. Targeting more parameter-efficient low-rank adaptation (LoRA), parameter sharing presents a promising solution. Empirically, our research into high-level sharing principles highlights the indispensable role of differentiation in reversing the detrimental effects of pure sharing. Guided by this finding, we propose Mixture of Shards (MoS), incorporating both inter-layer and intra-layer sharing schemes, and integrating four nearly cost-free differentiation strategies, namely subset selection, pair dissociation, vector sharding, and shard privatization. Briefly, it selects a designated number of shards from global pools with a Mixture-of-Experts (MoE)-like routing mechanism before sequentially concatenating them to low-rank matrices. Hence, it retains all the advantages of LoRA while offering enhanced parameter efficiency, and effectively circumvents the drawbacks of peer parameter-sharing methods. Our empirical experiments demonstrate approximately $8\times$ parameter savings in a standard LoRA setting. The ablation study confirms the significance of each component. Our insights into parameter sharing and MoS method may illuminate future developments of more parameter-efficient finetuning methods. The code is officially available at https://github.com/Forence1999/MoS.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3655

Loading