Abstract: Federated Learning (FL) is a promising paradigm for finetuning Large Language Models (LLMs) across distributed data sources while preserving data privacy. However, finetuning such large models is challenging on edge devices due to its high resource demand. Zeroth-order optimization estimates gradients through finite-difference approximations, which rely on function evaluations under random perturbations of the model parameters. Consequently, ZO with task alignment provides a potential solution, allowing finetuning using only forward passes with inference-level memory requirements and low communication overhead, but suffers from slow convergence and higher computational demand. In this paper, we propose a new ZO-based method that applies a more efficient technique to reduce the computational demand associated with using large number of perturbations, while preserving their convergence benefits. This is achieved by splitting the model into consecutive blocks and allocating a higher number of perturbations to the second block, enabling efficient reuse of intermediate activations to update the full network with fewer forward evaluations. Our evaluation on RoBERTa-large, OPT1.3B, LLaMa-3-3.2B models shows up to $3\times$ reduction in computation compared to the other ZO-based techniques, while retaining the memory and communication benefits over first-order federated learning techniques.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - Results for using LoRA and QLora with FedSPZO (Appendix B)
- Further comparison with first-order AdamW (Appendix A)
- Analysis on split placement (Appendix C)
- Revised citation formatting (\citet and \citep)
- Revised tradeoff sentence (Section 1) and highlighted the task-alignment fine-tuning (Section 4.1) --> Highlighted in blue
- Addition of SQuAD dataset results in the results section (main changes highlighted in blue)
- Improvement in algorithm presentation.
Assigned Action Editor: ~Jinghui_Chen1
Submission Number: 7662
Loading