Derivative-Free Optimization via Monotonic Stochastic Search

20 Sept 2025 (modified: 22 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Derivative-Free Optimization, Zeroth-order optimization
Abstract: We consider the problem of minimizing a differentiable function $f:\mathbb{R}^d \to \mathbb{R}$ using only function evaluations, in the zeroth-order (derivative-free) setting. We propose three related monotone stochastic algorithms: the \emph{Monotonic Stochastic Search} (MSS), persistent Monotonic Stochastic Search (pMSS), and MSS variant with gradient-approximation (MSSGA). MSS is a minimal stochastic direct-search method that samples a single Gaussian direction per iteration and performs an improve-or-stay update based on a single perturbation. For smooth non-convex objectives, we prove an averaged gradient-norm rate $\mathcal{O}(\sqrt{d}/\sqrt{T})$ in expectation, so that $\mathcal{O}(d/\varepsilon^2)$ function evaluations suffice to reach $\mathbb{E}||\nabla f(\theta^t)||_2 \le \varepsilon$, improving the quadratic dependence on $d$ of deterministic direct search while matching the best known stochastic bounds. In addition, we propose a practical variant, pMSS, that reuses successful search directions with sufficient decrease, and establish that it guarantees $\liminf{t\to\infty}||\nabla f(\theta^t)||_2 = 0$ almost surely. Since MSS relies solely on pairwise comparisons between $f(\theta^t)$ and $f(\theta^t+\alpha_t s_t)$, it falls within the class of optimization algorithms that assume access to an exact ranking oracle. We then generalize this framework to a stochastic ranking-oracle setting satisfying a local power-type margin condition, and demonstrate that a majority vote over $N$ noisy comparisons preserves the $\mathcal{O}(d/\varepsilon^2)$ gradient complexity in terms of iteration count, given suitably designed oracle queries. MSSGA uses finite-difference directional derivatives while enforcing monotonic descent. In the smooth non-convex regime, we show that the best gradient iterate converges almost surely at a rate of $o(1/\sqrt{T})$ almost surely. To the best of our knowledge, this result provides the first $o(1/\sqrt{T})$ almost-sure convergence guarantee for gradient-approximation methods employing random directions. Furthermore, our analysis extends to the classical Random Gradient-Free (RGF) algorithm, establishing the same almost-sure convergence rate, which has not been previously shown for RGF. Finally, we show that MSS remains robust beyond the smooth setting: when $f$ is continuously differentiable, the iterates satisfy $\liminf{t\to\infty}||\nabla f(\theta^t)||_2=0$ almost surely.
Primary Area: optimization
Submission Number: 24218
Loading