FeedSign: Full-parameter Federated Fine-tuning of Large Models with Extremely Low Communication Overhead of One Bit

Zhijie Cai; Haolong Chen; Guangxu Zhu

FeedSign: Full-parameter Federated Fine-tuning of Large Models with Extremely Low Communication Overhead of One Bit

Zhijie Cai, Haolong Chen, Guangxu Zhu

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Model Fine-tuning, Federated Learning, Heterogeneity Resilience, Byzantine Resilience

TL;DR: FeedSign: Full-parameter Federated Fine-tuning of Large Models with Extremely Low Communication Overhead of One Bit

Abstract: Federated fine-tuning (FFT) aims to fine-tune a pre-trained model with private data from distributed clients by exchanging models rather than data under the orchestration of a parameter server (PS). However, as large models are acing in almost every machine learning task, the communication overhead and memory demand are surging accordingly, hindering the practical deployment on consumer devices. To overcome the bottleneck forged by the growing communication overhead of federated learning and lower the high memory demand of large model fine-tuning, we propose FeedSign, an FFT algorithm where a client uploads its update model and downloads the global model of any size using exactly $1$ bit per step, while the memory demand is squeezed to the amount needed for inference. This is realized by utilizing zeroth-order (ZO) optimizers on large models and shared pseudo-random number generators (PRNG) across devices to split the gradient estimate from the clients to 1) a direction corresponding to a designated random seed and 2) a binary vote from the client indicating whether the seed-corresponding direction grants a local loss descent, which is the only information the clients should convey to the PS. We conduct theoretical analysis on FeedSign and show that it converges at an exponential rate $\mathcal{O}(e^{-t})$, where $t$ is the number of elapsed steps, the same rate as in first-order (FO) methods can attain in big $\mathcal{O}$ notation. Moreover, it is also found that FeedSign enjoys good robustness against data heterogeneity and Byzantine attacks. We conduct extensive experiments on models across different structures and sizes (11M to 13B) and found that the proposed method performs better or closely, depending on scenarios, compared to its ZO and FO counterparts albeit an orders-of-magnitude lower communication overhead. We also discuss some interesting advantages as byproducts guaranteed by the minimalistic design of FeedSign.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7398

Loading