TL;DR: We propose Ferret, the first first-order FL method with shared randomness to enable scalable full-parameter tuning of LLMs across decentralized data sources while maintaining competitive model accuracy.
Abstract: Large Language Models (LLMs) have become indispensable in numerous real-world applications. However, fine-tuning these models at scale, especially in federated settings where data privacy and communication efficiency are critical, presents significant challenges. Existing approaches often resort to parameter-efficient fine-tuning (PEFT) to mitigate communication overhead, but this typically comes at the cost of model accuracy. To this end, we propose *federated full-parameter tuning at scale for LLMs* (Ferret), **the first first-order method with shared randomness** to enable scalable full-parameter tuning of LLMs across decentralized data sources while maintaining competitive model accuracy. Ferret accomplishes this through three aspects: **(i)** it employs widely used first-order methods for efficient local updates; **(ii)** it projects these updates into a low-dimensional space to considerably reduce communication overhead; and **(iii)** it reconstructs local updates from this low-dimensional space with shared randomness to facilitate effective full-parameter global aggregation, ensuring fast convergence and competitive final performance. Our rigorous theoretical analyses and insights along with extensive experiments, show that Ferret significantly enhances the scalability of existing federated full-parameter tuning approaches by achieving high computational efficiency, reduced communication overhead, and fast convergence, all while maintaining competitive model accuracy. Our implementation is available at [https://github.com/allen4747/Ferret](https://github.com/allen4747/Ferret).
Lay Summary: (1) Problem: Training large AI models, especially in situations where data is private and spread across many different locations (like hospitals or banks), is a huge challenge. Current methods often make the models less accurate to save on communication costs. We want to train these powerful AI models fully, without sacrificing accuracy, even when data is decentralized and communication is limited.
(2) Solution: We developed Ferret, a new method that allows for full training of these large AI models in a federated setting. Ferret uses efficient local updates, then cleverly compresses these updates into a much smaller size for communication. Crucially, it uses a shared "randomness" to reconstruct the full updates on the central server, allowing for accurate and complete model adjustments.
(3) Impact: Ferret makes it possible to train the most powerful AI models on private, distributed data without compromising their performance. This means AI can be deployed more widely and effectively in sensitive areas like healthcare or finance, while respecting data privacy and significantly reducing the time and resources needed for training.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/allen4747/Ferret
Primary Area: Deep Learning->Large Language Models
Keywords: Large Language Models, Federated Full-Parameter Tuning, Scalability, Theoretical Guarantees
Submission Number: 8421
Loading