Improved Lower Bounds for First-order Stochastic Non-convex Optimization under Markov Sampling

Zhenyu Sun; Ermin Wei

Improved Lower Bounds for First-order Stochastic Non-convex Optimization under Markov Sampling

Zhenyu Sun, Ermin Wei

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Unlike its vanilla counterpart with i.i.d. samples, stochastic optimization with Markovian sampling allows the sampling scheme following a Markov chain. This problem encompasses various applications that range from asynchronous distributed optimization to reinforcement learning. In this work, we lower bound the sample complexity of finding $\epsilon$-approximate critical solutions for any first-order methods when sampling is Markovian. We show that for samples drawn from stationary Markov processes with countable state space, any algorithm that accesses smooth, non-convex functions through queries to a stochastic gradient oracle, requires at least $\Omega(\epsilon^{-4})$ samples. Moreover, for finite Markov chains, we show a $\Omega(\epsilon^{-2})$ lower bound and propose a new algorithm, called MaC-SAGE, that is proven to (nearly) match our lower bound.

Lay Summary: This research looks at how we can solve optimization problems when data is not randomly shuffled, but instead follows a pattern—like stepping from one state to another in a sequence, much like how weather changes from sunny to rainy over time. This kind of data shows up in areas like machine learning, robotics, and systems that work in parallel or over time. We studied how hard it is, in theory, to find good solutions under this kind of patterned data. We proved that, depending on how the data moves between states, there is a minimum amount of data needed to reach a "good enough" solution—generally much more than if the data were completely random. To tackle this challenge, we also designed a new method called MaC-SAGE, which performs nearly as well as the best possible under these conditions. This helps improve how we train learning systems that face time-related or structured data.

Primary Area: Optimization->Stochastic

Keywords: non-convex optimization, markov sampling

Submission Number: 5082

Loading