- TL;DR: We propose an efficient and robust asynchronous federated learning algorithm on the existence of stragglers
- Abstract: We address the efficiency issues caused by the straggler effect in the recently emerged federated learning, which collaboratively trains a model on decentralized non-i.i.d. (non-independent and identically distributed) data across massive worker devices without exchanging training data in the unreliable and heterogeneous networks. We propose a novel two-stage analysis on the error bounds of general federated learning, which provides practical insights into optimization. As a result, we propose a novel easy-to-implement federated learning algorithm that uses asynchronous settings and strategies to control discrepancies between the global model and delayed models and adjust the number of local epochs with the estimation of staleness to accelerate convergence and resist performance deterioration caused by stragglers. Experiment results show that our algorithm converges fast and robust on the existence of massive stragglers.
- Keywords: federated learning, straggler effect, distributed machine learning, distributed optimization
- Original Pdf: pdf