Abstract: We study the consensus decentralized optimization problem where the objective
function is the average of n agents private non-convex cost functions; moreover, the
agents can only communicate to their neighbors on a given network topology. The
stochastic learning setting is considered in this paper where each agent can only access
a noisy estimate of its gradient. Many decentralized methods can solve such problem
including EXTRA, Exact-Diffusion/D2
, and gradient-tracking. Unlike the famed Dsgd
algorithm, these methods have been shown to be robust to the heterogeneity across
the local cost functions. However, the established convergence rates for these methods
indicate that their sensitivity to the network topology is worse than Dsgd. Such
theoretical results imply that these methods can perform much worse than Dsgd over
sparse networks, which, however, contradicts empirical experiments where Dsgd is
observed to be more sensitive to the network topology.
In this work, we study a general stochastic unified decentralized algorithm (SUDA)
that includes the above methods as special cases. We establish the convergence of
SUDA under both non-convex and the Polyak- Lojasiewicz condition settings. Our results provide improved network topology dependent bounds for these methods (such as
Exact-Diffusion/D2 and gradient-tracking) compared with existing literature. Moreover, our results show that these methods are often less sensitive to the network topology compared to Dsgd, which agrees with numerical experiments.
0 Replies
Loading