Following the Correct Direction: Renovating Sparsified SGD Towards Global Optimization in Distributed Edge Learning

Abstract: Distributed edge learning collaborates powerful edge devices to train a shared global model. Since the frequent communication between the server and workers is very expensive, it is desired to accelerate the learning process. The gradient sparsification is an efficient method that only uploads a small subset of gradient elements. However, most existing works neglect the distributed nature of local datasets, and consequently the local gradients uploaded by edge devices cannot follow the global correct optimization direction well, which results in the loss of accuracy. In this paper, we propose a new gradient sparsification with a renovating mechanism, called Global Renovating Stochastic Gradient Descent (GRSGD). GRSGD utilizes the previous-round global gradient to estimate the current global one and renovates the current zero-sparsified local gradients. It mitigates the communication overhead while making the convergence direction of training closer to the global optimization, accelerating the distributed edge learning process. We provide a theoretical convergence guarantee for our algorithm based on the non-convex assumption, which better fits most deep learning problems. With extensive experiments in PyTorch, we show that GRSGD effectively accelerates the learning process with a smaller communication cost and a faster convergence rate on most training tasks. For example, on ImageNet MnasNet, GRSGD cuts down the gradient size from 8.47MB to 2.13MB while achieving 9.6%+ higher accuracy.
0 Replies
Loading