Adaptive Extrapolated Proximal Gradient Methods with Variance Reduction for Composite Nonconvex Finite-Sum Minimization

Adaptive Extrapolated Proximal Gradient Methods with Variance Reduction for Composite Nonconvex Finite-Sum Minimization

ICLR 2026 Conference Submission15170 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Convergence Analysis; Stochastic Optimization; Variance Reduction; Non-convex Optimization

Abstract: This paper proposes {\sf AEPG-SPIDER}, an Adaptive Extrapolated Proximal Gradient (AEPG) method with variance reduction for minimizing composite nonconvex finite-sum functions. It integrates three acceleration techniques: adaptive stepsizes, Nesterov's extrapolation, and the recursive stochastic path-integrated estimator SPIDER. Unlike existing methods that adjust the stepsize factor using historical gradients, {\sf AEPG-SPIDER} relies on past iterate differences for its update. While targeting stochastic finite-sum problems, {\sf AEPG-SPIDER} simplifies to {\sf AEPG} in the full-batch, non-stochastic setting, which is also of independent interest. To our knowledge, {\sf AEPG-SPIDER} and {\sf AEPG} are the first Lipschitz-free methods to achieve optimal iteration complexity for this class of \textit{composite} minimization problems. Specifically, {\sf AEPG} achieves the optimal iteration complexity of $\mathcal{O}(N \epsilon^{-2})$, while {\sf AEPG-SPIDER} achieves $\mathcal{O}(N + \sqrt{N} \epsilon^{-2})$ for finding $\epsilon$-approximate stationary points, where $N$ is the number of component functions. Under the Kurdyka-Lojasiewicz (KL) assumption, we establish non-ergodic convergence rates for both methods. Preliminary experiments on sparse phase retrieval and linear eigenvalue problems demonstrate the superior performance of {\sf AEPG-SPIDER} and {\sf AEPG} compared to existing methods.

Supplementary Material: zip

Primary Area: optimization

Submission Number: 15170

Loading