Abstract: Most distributed machine learning (ML) systems store a copy of the model parameters locally on each machine to minimize network communication. In practice, in order to reduce synchronization waiting time, these copies of the model are not necessarily updated in lock-step, and can become stale. Despite much development in large-scale ML, the effect of staleness on the learning efficiency is inconclusive, mainly because it is challenging to control or monitor the staleness in complex distributed environments. In this work, we study the convergence behaviors of a wide array of ML models and algorithms under delayed updates. Our extensive experiments reveal the rich diversity of the effects of staleness on the convergence of ML algorithms and offer insights into seemingly contradictory reports in the literature. The empirical findings also inspire a new convergence analysis of SGD in non-convex optimization under staleness, matching the best-known convergence rate of O(1/\sqrt{T}).
TL;DR: Empirical and theoretical study of the effects of staleness in non-synchronous execution on machine learning algorithms.
Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10), [MovieLens](https://paperswithcode.com/dataset/movielens)
9 Replies
Loading