WingsFL: Speed-up Federated Learning via Co-optimization of Communication Frequency and Gradient Compression Ratio
Keywords: Federated Learning; Convex Optimization; Gradient Compression; Infrequent Communication
Abstract: Federated Learning (FL) relies on two key strategies to overcome communication bottlenecks, which prevents the training under low bandwidths and large number of workers. The first strategy is infrequent communication, a core feature of the FedAVG algorithm, controlled by the number of local steps $\tau$. The second is gradient compression, a widely-used technique to reduce data volume, governed by a compression ratio $\delta$. However, finding the optimal ($\tau, \delta$) pair is a major challenge in realistic settings with device heterogeneity and network fluctuations. Existing works assume that the effects of $\delta$ and $\tau$ on the model convergence are orthogonal, optimizing them separately. In this work, we chanllenge this orthogonality assumption. We are the first to propose two virtual queues at distinct temporal granularities, helping derive the bound of the noise introduced by the two lossy strategies, respectively. We demonstrate that the convergence rate of FedAVG with gradient compression is critically affected by a key term $2^\tau / \delta^2$. This finding proves that $\tau$ and $\delta$ are intrinsically coupled and must be co-designed for efficient training. Furthermore, we propose WingsFL, which fixes the key convergence rate term and minimizes the end-to-end training time under device heterogenity by solving a one-variable Min-Max problem. WingsFL achieves up to $2.00\times$ and $2.18\times$ speed-ups over FedAVG and static strategy respectively, under realistic conditions of device heterogeneity and network fluctuations.
Primary Area: optimization
Submission Number: 5800
Loading