Balancing Gradient and Hessian Queries in Non-Convex Optimization

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: non-convex optimization, critical point, gradient query, Hessian query, dimension-dependent method
TL;DR: We develop optimization methods which offer new trade-offs between the number of gradient and Hessian computations needed to compute the critical point of a non-convex function
Abstract: We develop optimization methods which offer new trade-offs between the number of gradient and Hessian computations needed to compute the critical point of a non-convex function. We provide a method that for a twice-differentiable $f\colon \mathbb{R}^d \rightarrow \mathbb{R}$ with $L_2$-Lipschitz Hessian, and input initial point with $\Delta$-bounded sub-optimality and sufficiently small $\epsilon > 0$ outputs an $\epsilon$-critical point, i.e., a point $x$ such that $\|\nabla f(x)\| \leq \epsilon$, using $\tilde{O}(\Delta L_2^{1/4} n_H^{-1/2}\epsilon^{-9/4})$ queries to a gradient oracle and $n_H$ queries to a Hessian oracle. As a consequence, we obtain an improved gradient query complexity of $\tilde{O}(d^{1/3}L_2^{1/2}\Delta\epsilon^{-3/2})$ in the case of bounded dimension and of $\tilde{O}(\Delta^{3/2} L_2^{3/4}\epsilon^{-9/4})$ in the case where we are allowed only a single Hessian query. We obtain these results through a more general algorithm which can handle approximate Hessian computations and recovers known prior state-of-the-art bounds of computing an $\epsilon$-critical point, under the additional assumption that $f$ has an $L_1$-Lipschitz gradient, with $O(\Delta L_2^{1/4}\epsilon^{-7/4})$-gradient queries.
Primary Area: Optimization (e.g., convex and non-convex, stochastic, robust)
Submission Number: 23016
Loading