Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

Qihao Zhou; Haishan Ye; Luo Luo

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

Qihao Zhou, Haishan Ye, Luo Luo

Published: 25 Sept 2024, Last Modified: 20 Dec 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: distributed optimization, minimax optmization, second-order similarity

Abstract: This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the $\varepsilon$-duality gap within communication rounds of ${\mathcal O}(\delta D^2/\varepsilon)$, communication complexity of ${\mathcal O}(n+\sqrt{n}\delta D^2/\varepsilon)$, and local gradient calls of $\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)D^2/\varepsilon\log(1/\varepsilon))$, where $n$ is the number of nodes, $\delta$ is the degree of the second-order similarity, $L$ is the smoothness parameter and $D$ is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific $\mu$-strongly-convex-$\mu$-strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of $\mathcal O(\delta/\mu\log(1/\varepsilon))$, ${\mathcal O}((n+\sqrt{n}\delta/\mu)\log(1/\varepsilon))$, and $\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)/\mu)\log(1/\varepsilon))$ respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.

Supplementary Material: zip

Primary Area: Optimization (convex and non-convex, discrete, stochastic, robust)

Submission Number: 3977

Loading