Keywords: Federated Learing, Forward Gradient, Zeroth-Order, Parameter-Efficient Fine-Tuning
TL;DR: FineFed is a forward-only FL framework with shared momentum, uncertainty-guided forward gradients, and forward-only head tuning, accelerating convergence and reducing compute, memory, and communication across vision/NLP benchmarks under non-IID data.
Abstract: Federated learning (FL) on resource-constrained edge devices faces significant challenges when training large transformer models, particularly due to memory and computational limitations. While parameter-efficient fine-tuning (PEFT) methods help reduce memory usage, they still require back-propagation for gradient computation, which often demands more memory than storing model parameters. Forward-gradient (zero-order) FL offers a promising alternative by eliminating back-propagation, but existing methods suffer from computational inefficiency, poor performance on many-class tasks, and unstable convergence under non-IID data distributions.
We present \emph{FineFed}, an efficient forward-only FL framework that addresses these limitations through three key innovations: (i) \textbf{Forward-Only Head Tuning}, which enables exact gradient computation for many-class classification heads without back-propagation; (ii) \textbf{Uncertainty-Guided Forward Gradient Estimation}, which reduces computational cost by approximately $2.5\times$ via uncertainty-guided sample selection and micro-batch perturbations; and (iii) \textbf{Shared Momentum}, which ensures stable local updates and fast convergence under extreme non-IID data heterogeneity. Comprehensive evaluations across NLP and vision datasets demonstrate that FineFed achieves superior model accuracy and system efficiency compared to state-of-the-art methods, making forward-only federated learning practical for real-world deployment.
Our code is available at \url{https://anonymous.4open.science/r/FineFed-0554/}.
Primary Area: optimization
Submission Number: 9100
Loading