Optimal Local Convergence Rates of Stochastic First-Order Methods under Local $\alpha$-PL

Published: 03 Feb 2026, Last Modified: 03 Feb 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We study the local oracle complexity of stochastic first-order methods under a local $\alpha$–Polyak--Łojasiewicz ($\alpha$–PŁ) condition in a neighborhood of a target connected component $\mathcal M'$ of the local minimizer set. The parameter $\alpha\in[1,2]$ is the exponent of the gradient norm in the $\alpha$–PŁ inequality: $\alpha=2$ recovers the classical PŁ case, $\alpha=1$ corresponds to H\"older-type error bounds, and intermediate values interpolate between these regimes. Our performance criterion is the number of oracle queries required to output $\hat x$ with $F(\hat x)-l\le\varepsilon$, where $l:=F(y)$ for any $y\in\mathcal M'$. We work in a local regime where the algorithm is initialized near $\mathcal M'$ and, with high probability, its iterates remain in that neighborhood. We establish a lower bound $\Omega(\varepsilon^{-2/\alpha})$ for all stochastic first-order methods in this regime, and we obtain a matching upper bound $\mathcal O(\varepsilon^{-2/\alpha})$ for $1\le \alpha<2$ via a SARAH-type variance-reduced method with time-varying batch sizes and step sizes. Thus, for $1\le\alpha<2$, the optimal dependence on $\varepsilon$ is $\Theta(\varepsilon^{-2/\alpha})$. In the convex setting, assuming a local $\alpha$–PŁ condition on the $\varepsilon$-sublevel set, we further show a complexity lower bound $\widetilde{\Omega}(\varepsilon^{-2/\alpha})$ for reaching an $\varepsilon$-global optimum, matching the $\varepsilon$-dependence of known accelerated stochastic subgradient methods.
Submission Number: 1344
Loading