Stochastic Gaussian Zeroth-Order Optimization: Improved Convergence Analysis under Skewed Hessian Spectra

Yilong Wang; Haishan Ye; Yong Liu; Ivor Tsang; Guang Dai; Jingdong Wang

Stochastic Gaussian Zeroth-Order Optimization: Improved Convergence Analysis under Skewed Hessian Spectra

Yilong Wang, Haishan Ye, Yong Liu, Ivor Tsang, Guang Dai, Jingdong Wang

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: stochastic zeroth-order optimization, quadratic regularity, skewed Hessian spectra

Abstract: This paper addresses large-scale finite-sum optimization problems, which are particularly prevalent in the big data era. In the field of zeroth-order optimization, stochastic methods have become essential tools. Natural zeroth-order stochastic methods primarily rely on stochastic gradient descent (SGD). Preprocessing the stochastic gradient using a Gaussian vector defines the method ZO-SGD-Gauss (ZSG), whereas estimating coordinate-wise partial derivatives defines ZO-SGD-Coordinate (ZSC). Compared to ZSC, ZSG often demonstrates superior performance in practice. However, the underlying mechanisms behind this phenomenon remain unclear in the academic community. To the best of our knowledge, our work is the first to theoretically analyze the potential advantages of ZSG compared to ZSC. To facilitate convergence analysis, the quadratic regularity assumption is introduced to generalize the smoothness and strong convexity to the Hessian matrix. This assumption makes it possible to integrate Hessian information into the complexity analysis. We provide a theoretical analysis proving the significant convergence improvement of ZSG. Finally, experiments on both synthetic and real-world datasets validate the effectiveness of our theoretical analysis.

Primary Area: optimization

Submission Number: 24536

Loading