Splitting with Importance-aware Updating for Heterogeneous Federated Learning with Large Language Models
TL;DR: FedICU improves federated fine-tuning of large language models by splitting client updates into shared and unique parts and selectively uploading only important changes to boost generalization and efficiency.
Abstract: Federated learning provides an efficient privacy-preserving distributed training framework for large language models, addressing the growing scarcity of publicly available training data while enabling the utilization of private datasets. While integrating large language model fine-tuning with federated learning emerges as a promising research direction, researchers pay limited attention to non-IID instruction-following scenarios. Our key insight is decomposing client updates into consensus and divergence components, enabling the model to maintain core capabilities while adapting to domain-specific knowledge. We propose a novel federated learning framework called **FedICU** (Splitting with **I**mportan**C**e-aware **U**pdating for Heterogeneous **Fed**erated Learning with Large Language Models), which introduces an aggregation mechanism that dynamically balances these components based on their contribution to global model performance, while implementing an importance-aware parameter updating strategy to prevent catastrophic forgetting and domain overfitting. Extensive experiments across diverse domains demonstrate that FedICU significantly outperforms existing federated learning approaches in terms of both generalization performance and domain adaptation. Our code is available at https://github.com/liaosunny123/FedICU.
Lay Summary: Large language models like ChatGPT are powerful tools, but training them often requires high-quality datasets. However, in recent years, these high-quality datasets have been gradually exhausted, and more attention is now being given to private datasets, which have yielded significant results. Federated learning helps by allowing many individuals or organizations to train a shared model without moving their data — each trains locally and only sends updates. However, when different users have very different data or needs, this approach can degrade the shared model’s performance for everyone. Our research addresses this issue. We designed a new method called **FedICU**, which helps the shared model learn both general knowledge and user-specific needs without sacrificing overall performance. It achieves this by carefully separating what is common across users from what is unique to each one, and then combining these updates in an intelligent way. It also only sends the most important parts of the update, saving both time and computing resources. This means we can train powerful language models using private, diverse data, while minimizing the degradation of model generalization due to heterogeneous datasets and maintaining overall model performance across various downstream tasks.
Link To Code: https://github.com/liaosunny123/FedICU
Primary Area: Deep Learning->Large Language Models
Keywords: Large Language Model, Federated learning
Submission Number: 4700
Loading