Keywords: Zeroth-Order Optimization, Silver Stepsize, Gradient-Free
Abstract: We study gradient-free minimization of smooth convex functions through Silver stepsizes—a non-monotone, 2-adic schedule that accelerates gradient descent—and show how to compose it with two-point zeroth-order (ZO) estimators on a smoothed objective.
We apply Silver’s multi‑step Lyapunov analysis to smoothed objectives and show that it carries over verbatim when gradients are replaced by unbiased two‑point estimators with a tax in the form of a quadratic variance term.
We control this term via an orthogonal-on-spikes batching policy that allocates directions proportionally to the Silver steps (with a cap at dimension), achieving budget-optimal variance aggregation.
Empirically, we validate our approach through both numerical experiments and MeZO-style forward-pass-only fine-tuning of large language models, incorporating practical considerations such as clipping strategies, and demonstrate its superior performance.
Primary Area: optimization
Submission Number: 25582
Loading