Abstract: In this paper, we address the problem of stochastic optimization over distributed processing networks, which is motivated by machine learning applications performed in data centers. In this problem, each of a total n nodes in a network receives stochastic realizations of a private function f <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i</sub> (x) and aims to reach a common value that minimizes Σ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i=1</sub> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</sup> f <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i</sub> (x) via local updates and communication with its neighbors. We focus on zeroth-order methods where only function values of stochastic realizations can be used. Such kind of methods, which are also called derivative-free, are especially important in solving realworld problems where either the (sub)gradients of loss functions are inaccessible or inefficient to be evaluated. To this end, we propose a method called Distributed Stochastic Alternating Direction Method of Multipliers (DS-ADMM) which can choose to use two kinds of gradient estimators for different assumptions. The convergence rates of DS-ADMM are O(n√k log (2k)/T) for general convex loss functions and O(n k log (2kT)/T) for strongly convex functions in terms of optimality gap, where k is the dimension of domain and T is the time horizon of the algorithm. The rates can be improved to O(n/√T )and O(n log T/T) if objective functions have Lipschitz gradients. All these results are better than previous distributed zerothorder methods. Lastly, we demonstrate the performance of DSADMM via experiments of two examples called distributed online least square and distributed support vector machine arising in estimation and classification tasks.
0 Replies
Loading