Abstract: We study the consensus decentralized optimization problem where the objective function is the average of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$n$</tex-math></inline-formula> agents private non-convex cost functions; moreover, the agents can only communicate to their neighbors on a given network topology. The stochastic learning setting is considered in this paper where each agent can only access a noisy estimate of its gradient. Many decentralized methods can solve such problem including EXTRA, Exact-Diffusion/D <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$^{2}$</tex-math></inline-formula> , and gradient-tracking. Unlike the famed <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Dsgd</small> algorithm, these methods have been shown to be robust to the heterogeneity across the local cost functions. However, the established convergence rates for these methods indicate that their sensitivity to the network topology is worse than <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Dsgd</small> . Such theoretical results imply that these methods can perform much worse than <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Dsgd</small> over sparse networks, which, however, contradicts empirical experiments where <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Dsgd</small> is observed to be more sensitive to the network topology. In this work, we study a general <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">s</u> tochastic <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">u</u> nified <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">d</u> ecentralized <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">a</u> lgorithm ( <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SUDA</b> ) that includes the above methods as special cases. We establish the convergence of <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SUDA</b> under both non-convex and the Polyak-Łojasiewicz condition settings. Our results provide improved network topology dependent bounds for these methods (such as Exact-Diffusion/D <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$^{2}$</tex-math></inline-formula> and gradient-tracking) compared with existing literature. Moreover, our results show that these methods are often less sensitive to the network topology compared to <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Dsgd</small> , which agrees with numerical experiments.
0 Replies
Loading