SERENA: A Unified Stochastic Recursive Variance Reduced Gradient Framework for Riemannian Non-Convex Optimization

Yan Liu; Mingjie Chen; Chaojie Ji; Hao Zhang; Ruxin Wang

SERENA: A Unified Stochastic Recursive Variance Reduced Gradient Framework for Riemannian Non-Convex Optimization

Yan Liu, Mingjie Chen, Chaojie Ji, Hao Zhang, Ruxin Wang

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, the expansion of Variance Reduction (VR) to Riemannian stochastic non-convex optimization has attracted increasing interest. Inspired by recursive momentum, we first introduce Stochastic Recursive Variance Reduced Gradient (SRVRG) algorithm and further present Stochastic Recursive Gradient Estimator (SRGE) in Euclidean spaces, which unifies the prevailing variance reduction estimators. We then extend SRGE to Riemannian spaces, resulting in a unified Stochastic rEcursive vaRiance reducEd gradieNt frAmework (SERENA) for Riemannian non-convex optimization. This framework includes the proposed R-SRVRG, R-SVRRM, and R-Hybrid-SGD methods, as well as other existing Riemannian VR methods. Furthermore, we establish a unified theoretical analysis for Riemannian non-convex optimization under retraction and vector transport. The IFO complexity of our proposed R-SRVRG and R-SVRRM to converge to $\varepsilon$-accurate solution is $\mathcal{O}\left(\min \{n^{1/2}{\varepsilon^{-2}}, \varepsilon^{-3}\}\right)$ in the finite-sum setting and ${\mathcal{O}\left( \varepsilon^{-3}\right)}$ for the online case, both of which align with the lower IFO complexity bound. Experimental results indicate that the proposed algorithms surpass other existing Riemannian optimization methods.

Lay Summary: In practical machine learning tasks, some tasks have parameter spaces that are not the familiar Euclidean space, but rather Riemannian spaces; for example, the parameter space may be a sphere. We study algorithms for solving optimization problems in machine learning within Riemannian spaces. We first present a new algorithm that is theoretically consistent with the best-known results and also performs excellently in numerical experiments. Additionally, we provide a unified algorithmic framework that encompasses several previous algorithms, which facilitates a better understanding and application of such algorithms.

Primary Area: Optimization->Everything Else

Keywords: variance reduction; recursive momentum; unified theory; Riemannian non-convex optimization

Submission Number: 5712

Loading