FeDeRA: Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition

Yuxuan Yan; Shunpu Tang; Yuanchao Shu; Zhiguo Shi; Qianqian Yang

FeDeRA: Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition

Yuxuan Yan, Shunpu Tang, Yuanchao Shu, Zhiguo Shi, Qianqian Yang

14 Sept 2024 (modified: 20 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: federated learning, fine-tune, low rank adaption

Abstract: Federated learning (FL) is a widely used privacy-preserving approach for distributed training that avoids the need to collect data from individual users. In this paper, we investigate fine-tuning pre-trained language models (PLMs) in an FL setting and leverage parameter-efficient fine-tuning (PEFT) methods to reduce computational and communication costs. However, non-IID data in federated learning significantly degrades the performance of PEFT, with the degradation worsening as data heterogeneity increases. To address this, we propose FeDeRA, an FL approach for fine-tuning PLMs that incorporates an effective extension of the low-rank adaptation (LoRA) method. Specifically, FeDeRA initializes the low-rank matrices using Singular Value Decomposition (SVD) on the pre-trained weight matrices, rather than the zero or random initialization used in the original LoRA method. Analyzing weight updates during training reveals that FeDeRA reduces weight oscillations, enabling faster and more efficient fine-tuning of PLMs in FL with non-IID data. Experimental results across multiple NLP tasks and models show that FeDeRA outperforms all PEFT-based baselines in task performance and, in some cases, even matches or exceeds the performance of full-parameter fine-tuning. FeDeRA also greatly enhances training efficiency, reducing training time by up to 97.3\% compared to full-parameter fine-tuning and up to 74.6\% compared to the fastest PEFT baseline in practical FL settings. Furthermore, FeDeRA demonstrates greater robustness to data heterogeneity than all other PEFT methods, highlighting the effectiveness of its proposed initialization in FL systems.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 625

Loading