Abstract: We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently
used stochastic distributed optimization method. Its theoretical foundations are currently lacking and
we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline,
minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD
and that accelerated local SGD is minimax optimal for quadratics; (2) For general convex objectives we
provide the first guarantee that at least sometimes improves over minibatch SGD; (3) We show that
indeed local SGD does not dominate minibatch SGD by presenting a lower bound on the performance
of local SGD that is worse than the minibatch SGD guarantee.
0 Replies
Loading