\section{Introduction}
\label{sec:intro}

In recent years, several optimization algorithms have been proposed, especially for machine learning problems, that achieve improved performance by exploiting the smoothness of the function to be optimized. 
Specifically, these algorithms have better performance when the function's first, second, or higher order derivatives are Lipschitz~\cite{Nes08,baes2009estimate,MS13,nesterov2019implementable,GDG+19,JWZ19,bubeck19ahighlysmooth}.

In this paper we study the problem of minimizing a highly smooth convex function given black-box access to the function and its higher order derivatives. The simplest example of the family of problems we consider here is the problem of approximately minimizing a convex function $f:\R^n \to \R$, given access to an oracle that on input $x \in \R^n$ outputs $(f(x),\nabla f(x))$, under the assumption that the function's first derivative, its gradient $\nabla f$, has bounded Lipschitz constant. This problem can be solved by Nesterov's accelerated gradient descent, and it is known that this algorithm is optimal (in high dimension) among deterministic and randomized algorithms \cite{Nes83,nemirovsky1983problem}. 

More generally, for any positive integer $p$, consider the $p$th-order optimization problem: For known $R>0$, $\eps>0$, and $L_p>0$, we have a $p$ times differentiable convex function $f:\R^n \to \R$ whose $p$th derivative has Lipschitz constant at most $L_p$, which means
\begin{equation}\label{eq:Lp}
    \norm{\nabla^p f(x) - \nabla^p f(y)} \leq L_p \norm{x-y},
\end{equation}
where $\norm{\cdot}$ is the $\ell_2$ norm (for vectors) or induced $\ell_2$ norm 
(for operators). 
Our goal is to find an $\eps$-approximate minimum of this function in a ball of radius $R$, which is any $x^*$ that satisfies
\begin{equation}\label{eq:eps}
    f(x^*)-\min_{x\in B_R(0)} f(x) \leq \eps, 
\end{equation}
where $B_R(0)$ is the $\ell_2$-ball of radius $R$ around the origin. We can access the function $f$ through a $p$th order oracle, which when queried with a point $x \in \R^n$ outputs 
\begin{equation}\label{eq:oracle}
    (f(x), \nabla f(x), \ldots, \nabla^p f(x)).
\end{equation}
As usual, $\nabla^p f(x)$ denotes the $p$th derivative of $f(x)$.

Our primary object of study will be the minimum query cost of an algorithm that solves the problem, i.e. the number of queries (or calls) to the oracle in \cref{eq:oracle} that an algorithm has to make.\footnote{For simplicity we assume that the oracle's output is computed to arbitrarily many bits of precision. 
This only makes our results stronger, since we prove lower bounds in this paper.
%We also allow the algorithm to query the oracle at any point in $\R^n$, even though our domain is bounded. Since our results establish lower bounds, these assumptions only only make results stronger.
} 
For a fixed $p$, it seems like this problem has 4 independent parameters, $n$, $L_p$, $R$, and $\eps$, but the parameters are not all independent since we can scale the input and output spaces of the function to affect the latter 3 parameters. Thus the complexity of any algorithm can be written as a function of $n$ and $L_pR^{p+1}/\eps$. In this paper we focus on the high-dimensional setting where $n$ may be much larger than the other parameters, and the best algorithms in this regime have complexity that only depends on $L_pR^{p+1}/\eps$ with no dependence on $n$.

As noted, the $p=1$ problem has been studied since the early 80s \cite{Nes83,nemirovsky1983problem}, and the $p>1$ problem has also been considered~\cite{Nes08,MS13}. In an exciting recent development, new algorithms were proposed for all $p$ (with very similar complexity) by three independent groups of researchers: Gasnikov, Dvurechensky, Gorbunov, Vorontsova, Selikhanovych, and Uribe~\cite{GDG+19}; Jiang, Wang, and Zhang~\cite{JWZ19}; Bubeck, Jiang, Lee, Li and Sidford~\cite{bubeck19ahighlysmooth}. 
All three groups develop deterministic algorithms that make
\begin{equation}
    \tilde{O}_p\left(\left({L_pR^{p+1}}/{\epsilon}\right)^{2/(3p+1)} 
    %\log\left({L_pR^{p+1}}/{\epsilon}\right)
    \right)
\end{equation}
oracle calls,\footnote{Note that the query complexity does not have any dependence on the dimension $n$. Of course, actually implementing each query will take poly$(n)$ time, but we only count the number of queries here.} where the subscript $p$ in the big Oh (or big Omega) notation means the constant in the big Oh can depend on $p$. In other words, this notation means that we treat $p$ as a constant. This improved upon the bounds of \cite{baes2009estimate,nesterov2019implementable}; both the works develop deterministic algorithms that make
\begin{equation}
    \tilde{O}_p\left(\left({L_pR^{p+1}}/{\epsilon}\right)^{1/(p+1)} 
    %\log\left({L_pR^{p+1}}/{\epsilon}\right)
    \right)
\end{equation}
oracle calls.

This algorithm is nearly optimal among deterministic algorithms, since the works \cite{nesterov2019implementable,ArjevaniSS19} showed that any deterministic algorithm that solves this problem must make $\Omega_p\left(\left({L_pR^{p+1}}/{\epsilon}\right)^{2/(3p+1)}\right)$
queries. 
However, for randomized algorithms, the known lower bound is weaker. Agarwal and Hazan~\cite{agarwal2018lower} showed that any randomized algorithm must make  
\begin{equation}
    \Omega_p\left(\left({L_pR^{p+1}}/{\epsilon}\right)^{2/(5p+1)}\right)    
\end{equation}
queries. To the best of our knowledge, no lower bounds are known in the setting of high-dimensional smooth convex optimization against quantum algorithms, although quantum lower bounds are known in the low-dimensional setting~\cite{CCLW20,vAGGdW20} and for non-smooth convex optimization~\cite{GKNS21}.

In this work, we close the gap (up to log factors) between the known algorithm and randomized lower bound for all $p$. Furthermore, our lower bound also holds against quantum algorithms.

\begin{theorem}\label{thm:main}
    Fix any $p \in \N$. For all $\epsilon>0$, $R>0$, $L_p>0$, there exists an $n>0$ and a set of $n$-dimensional functions $\mF$ with $p$th-order Lipschitz constant $L_p$ (i.e., satisfying \cref{eq:Lp}) such that any randomized or quantum algorithm that outputs an $\eps$-approximate minimum (satisfying \cref{eq:eps}) for any function $f \in \mF$ must make 
    \begin{equation}
        \Omega_p\left(\left( {L_pR^{p+1}}/{\epsilon} \right)^{2/(3p+1)} \left(\log{L_pR^{p+1}}/{\epsilon}\right)^{-2/3}\right)    
    \end{equation}
    queries to a $p$th order oracle for $f$ (as in \cref{eq:oracle}).
\end{theorem}

In fact, this lower bound holds even against highly parallel randomized algorithms, where the algorithm can make poly($n,L_pR^{p+1}/{\epsilon}$) queries in each round and we only count the total number of query rounds (and not the total number of queries). See~\cite{BJLLS19} for previous work in this setting, including speedups for first-order convex optimization in the low dimensional setting. 
%We believe our lower bound also holds for highly parallel quantum algorithms, but have not fleshed out the details. 
%%Maybe let's not say this? It sounds like a limitation and also people may not care?

In this introduction, we have deliberately avoided explaining the quantum model of computation to make the results accessible to readers without a background in quantum computing. The entire paper is written so that the randomized lower bound is fully accessible to any reader who does not wish to understand the quantum model and quantum lower bound. 
For readers familiar with quantum computing, we note that the only thing to be changed to get the quantum model is to modify the oracle in \cref{eq:oracle} to support queries in quantum superposition. This is done in the usual way, by defining a unitary implementation of the oracle, which allows quantum algorithms to make superposition queries and potentially solve the problem more efficiently than randomized algorithms.

