SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization

Hanseul Cho; Chulhee Yun

SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization

Hanseul Cho, Chulhee Yun

Published: 01 Feb 2023, Last Modified: 20 Feb 2023ICLR 2023 posterReaders: Everyone

Keywords: minimax optimization, SGDA, without-replacement sampling, random reshuffling, Polyak-Łojasiewicz

TL;DR: We study the convergence bounds of (mini-batch) SGDA with random reshuffling for nonconvex-PŁ and primal-PŁ-PŁ problems.

Abstract: Stochastic gradient descent-ascent (SGDA) is one of the main workhorses for solving finite-sum minimax optimization problems. Most practical implementations of SGDA randomly reshuffle components and sequentially use them (i.e., without-replacement sampling); however, there are few theoretical results on this approach for minimax algorithms, especially outside the easier-to-analyze (strongly-)monotone setups. To narrow this gap, we study the convergence bounds of SGDA with random reshuffling (SGDA-RR) for smooth nonconvex-nonconcave objectives with Polyak-{\L}ojasiewicz (P{\L}) geometry. We analyze both simultaneous and alternating SGDA-RR for nonconvex-P{\L} and primal-P{\L}-P{\L} objectives, and obtain convergence rates faster than with-replacement SGDA. Our rates extend to mini-batch SGDA-RR, recovering known rates for full-batch gradient descent-ascent (GDA). Lastly, we present a comprehensive lower bound for GDA with an arbitrary step-size ratio, which matches the full-batch upper bound for the primal-P{\L}-P{\L} case.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Optimization (eg, convex and non-convex optimization)

18 Replies

Loading