Abstract: In this article, we study the communication, and (sub)gradient computation costs in distributed optimization. We present two algorithms based on the framework of the accelerated penalty method with increasing penalty parameters. Our first algorithm is for smooth distributed optimization, and it obtains the V L near optimal O(√L/ϵ(1-σ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> (W)) log 1/ϵ) communication complexity, VL and the optimal O(√L/ϵ) gradient computation complexity for L-smooth convex problems, where σ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> (W) denotes the second largest singular value of the weight matrix W associated to the network, and e is the target accuracy. When the problem is μ-strongly convex, and L-smooth, our algorithm has the near optimal O(√L/μ(1-σ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> (W)) log <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> 1/ϵ) complexity for communications, VL and the optimal O(√L/μ log <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> 1/ϵ) complexity for gradient computations. Our communication complexities are only worse by a factor of (log 1/ϵ) than the lower bounds. Our second algorithm is designed for nonsmooth distributed optimization, and it achieves both the optimal O(1/ϵ√1-σ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> (W)) communication complexity, and O(1/ϵ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) subgradient computation complexity, which match the lower bounds for nonsmooth distributed optimization.
0 Replies
Loading