The Advancement in Stochastic Zeroth-Order Optimization: Mechanism of Accelerated Convergence of Gaussian Direction on Objectives with Skewed Hessian Eigenvalues
Keywords: stochastic zeroth-order optimization, quadratic regularity, gaussian direction, skewed Hessian eigenvalues
Abstract: This paper primarily investigates large-scale finite-sum optimization problems, which are particularly prevalent in the big data era.
In the field of zeroth-order optimization, stochastic optimization methods have become essential tools.
Natural zeroth-order stochastic optimization methods are primarily based on stochastic gradient descent ($\texttt{SGD}$).
The method of preprocessing the stochastic gradient with Gaussian vector is referred to as $\texttt{ZO-SGD-Gauss}$ ($\texttt{ZSG}$), while estimating partial derivatives along coordinate directions to compute the stochastic gradient is known as $\texttt{ZO-SGD-Coordinate}$ ($\texttt{ZSC}$).
Compared to $\texttt{ZSC}$, $\texttt{ZSG}$ often demonstrates superior performance in practice.
However, the underlying mechanisms behind this phenomenon remain unclear in the academic community.
To the best of our knowledge, our work is the first to theoretically analyze the potential advantages of $\texttt{ZSG}$ compared to $\texttt{ZSC}$.
Unlike the fundamental assumptions applied in general stochastic optimization analyses, the quadratic regularity assumption is proposed to generalize the smoothness and strong convexity to the Hessian matrix.
This assumption allows us to incorporate Hessian information into the complexity analysis.
When the objective function is quadratic, the quadratic regularity assumption reduces to the second-order Taylor expansion of the function, and we focus on analyzing and proving the significant improvement of $\texttt{ZSG}$.
For other objective function classes, we also demonstrate the convergence of $\texttt{ZSG}$ and its potentially better query complexity than that of $\texttt{ZSC}$.
Finally, experimental results on both synthetic and real-world datasets substantiate the effectiveness of our theoretical analysis.
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 14041
Loading