

% \begin{figure*}[!h]%\vspace{-2mm}
%   \centering
%   \includegraphics[width=0.85
%   \linewidth,height=!]{low_v.png}\vspace{-3mm}
% %   \includegraphics{synthetic.png}\vspace{-3mm}
%   \caption{Performance over low dimensional problems: Ackley (left), Levy (middle) and Rastrigin (right)}
%   \label{fig:synthetic}
% \end{figure*}

%\vspace{-1.5mm}
\section{Numerical Experiments}\label{sec:num_exp}
\input{ablation/ablation}




%\vspace{-2.0mm}
\subsection{Comparisons with other methods}\label{s:exp}
%\vspace{-1.5mm}
%After tuning the hyper-parameters of CobBO over a number of commonly used benchmarks, we fix 
The default configuration for CobBO
is configured with the default settings and hyperparameters for all the experiments, which are specified in the supplementary materials. 
% together with more experiments. 
%Following the same default configuration, 
CobBO performs on par or outperforms a collection of state-of-the-art methods across the following experiments. %This further demonstrates the robustness of CobBO.
%\niv{What is the "default setting" ? The begining of the experiments section is a good place to specify those.}
 %We use extensive experiments to demonstrate the performance of CobBO in both trial complexity and time complexity, 
%all conducted with the default setting. 
%Although CobBO has a number of parameters, it is proven to be robust to those.
%\niv{Where is the robustness to parameters is 'proven' ? Reference to some experiment ?}
%For example, CobBO uses a default setting to allocate the number of initial points ($8\%$ of the total budget, capped at $500$), 
%unless it is explicitly specified. 
Most of the experiments are conducted using the same settings as in TurBO~\cite{turbo2019}, where it is compared with a comprehensive list of baselines, including BFGS, BOCK~\cite{bock2018}, BOHAMIANN~\cite{bohamiann}, CMA-ES~\cite{cmaes}, BOBYQA, EBO~\cite{wang18aistats}, GP-TS, HeSBO~\cite{chaudhuri2019}, Nelder-Mead and random search. 
To avoid repetitions, we only show TuRBO and CMA-ES that achieve the best performance among this list, and additionally compare CobBO with BADS~\cite{luigi2017}, % HDBBO~\cite{zi2017}, SIR~\cite{miao2019},
Differetial Evolution (Diff-Evo)~\cite{storn1997differential},
Tree Parzen Estimator (TPE)~\cite{TPE2011} and Adaptive TPE (ATPE)~\cite{ATPE}. 
Comparisons with REMBO~\cite{ziyuw2016} are presented in Appendix E.2.

%The python code of the experiments and implementation of CobBO will be made publicly available.
%We repeat each experiment independently for 30 times to get the 95\% confidence intervals. 
%, and d-KG~\cite{wujian2017}.
%Though LineBO~\cite{linebo} and DROPOUT~\cite{dropoutbo} are also based on subspace selection,
%they do not show comparable performance.  
%Confidence intervals are computed with the results of 30 independent experiments.


%To overcome this limitation, the recent approach BOHB [33] combines Bayesian optimization and HyperBand 
%to achieve the best of both worlds: strong anytime performance (quick improvements in the beginning by using 
%low fidelities in HyperBand) and strong final performance (good performance in the long run by replacing HyperBand’s 
%random search by Bayesian optimization).


%and SigOpt~\cite{sigopt}. 
%SigOpt has an online black-box optimization service without disclosing its underlying algorithm details. 
% In addition to the benchmarks tested in~\cite{turbo2019}, we also provide new benchmarks,
%including deep neural network and industrial melting.  These benchmarks cover a wide spectrum of applications.
%BOBYAQ, BFGS, Nelder-Mead.
%\subsection{The effect of initial sampling}
%Traditionally initial sampling is conducted through random sampling. 





\begin{figure*}[htb]%\vspace{-2mm}
\begin{center}
  %\includegraphics[width=0.9\linewidth,height=!]{figures/low_mid_high_200d_fixed.pdf}%
  \includegraphics[width=0.98\linewidth,height=!]{figures/low_medium_high_100}%
  \vspace{-5mm}
%   \includegraphics{lunar-robot.png}
\end{center}
  \caption{Performance over low (left) medium (middle) and high (right) dimensional problems}
  \label{fig:low_medium_high}\vspace{0.0mm}
%   \vspace{5mm}
\end{figure*}


%\vspace{-1.5mm}
\subsubsection{Low-dimensional tests:}\label{ss:lowDtest}
%\vspace{-1.5mm}
 To evaluate CobBO on low dimensional problems, we use the lunar landing~\cite{turbo2019}, robot pushing~\cite{wang18aistats} and synthetic functions by following the setup in~\cite{turbo2019}. Confidence intervals ($95\%$) over $30$ independent experiments for each problem are shown in Fig.~\ref{fig:low_medium_high}.
 
 \vspace{0.25cm}
\noindent \emph{Lunar landing (maximization):}
This controller learning problem ($12$ dimensions) is provided by the OpenAI gym and evaluated in~\cite{turbo2019}.
%The controller of a lunar lander decides whether or not to fire the booster engine and the firing direction during landing,   
%based on the current status of the lander in each frame. 
%The average performance of the controller is evaluated by simulations over %a fixed constant set of 
%50 randomly generated terrains and initial states. 
Each algorithm has $50$ initial points and a budget of $1,500$ trials. 
TuRBO is configured with $5$ trust regions and a batch size of $50$ as in~\cite{turbo2019}.   
Fig.~\ref{fig:low_medium_high} (upper left) shows that, among the $30$ independent tests, CobBO quickly exceeds $300$ along some good sample paths.  % outperforming other algorithms. 

% \begin{figure*}[htb]%\vspace{-2mm}
% \begin{center}
%   \includegraphics[width=0.75\linewidth,height=!]{lunar-robot.png}\vspace{-3mm}
% %   \includegraphics{lunar-robot.png}
% \end{center}
%   \caption{Performance over the more complicated lunar landing (left) and robot pushing (right) problems}
%   \label{fig:lunar-robot}
% \end{figure*}
% \begin{figure}[htb]\vspace{-2mm}
% \begin{center}
%   \includegraphics[width=0.75\columnwidth,height=!]{low_hard_v.png}\vspace{-3mm}
% %   \includegraphics{lunar-robot.png}
% \end{center}
%   \caption{Performance over the lunar landing (upper) and robot pushing (lower) problems}\vspace{2mm}
%   \label{fig:lunar-robot}
% \end{figure}


\vspace{0.25cm}
\noindent \emph{Robot pushing (maximization):}
This control problem (14 dimensions) is introduced in~\cite{wang18aistats} and extensively tested in~\cite{turbo2019}. We follow the setting in~\cite{turbo2019}, where TuRBO is configured with a batch size of $50$ and $15$ trust regions, each of which has $30$ initial points. 
%We exclude REMBO that consumes more than $24$ hours per run.  
Each experiment has a budget of $10,000$ evaluations.
On average CobBO exceeds $10$ within $5500$ trials,
%while TuRBO requires about $7000$, 
as shown in Fig. ~\ref{fig:low_medium_high} (lower left).
%Some CobBO runs even get close to 11.0 within 6,000 evaluations. 
%TPE and ATPE converge to around $9$, outperforming BADS and CEM-ES by large margins. 
%The latter two exhibit large variations and get stuck in local optima.

% CobBO finds the best results for the robot pushing problem, 
% slightly outperforming TuRBO, as shown in Fig. ~\ref{fig:control-additive}. 
% Both TPE and ATPE  are less competitive but still outperform BADS and CMA-ES with large margins. 
% The latter two algorithms show large variations and get stuck in suboptima at very early stages.

\vspace{0.25cm}
\noindent \emph{Low-dimensional synthetic black-box functions (minimization):} Three additional synthetic $10$ dimensional functions~\cite{TestProblems2013} are experimented with in Fig.~\ref{fig:synthetic}, including Ackley over $[-5, 10]^{10}$, Levy over $[-5, 10]^{10}$ and Rastrigin over $[-3, 4]^{10}$.
%, and Hartmann(6D) with domain $[0, 1]^{6}$
%Each experiment has a budget of $500$ evaluations. 
TuRBO is configured the same as in~\cite{turbo2019}, with a batch size of $10$ and $5$ concurrent trust regions where each has $10$ initial points. 
The other algorithms use $20$ initial points. The results are shown in Fig.~\ref{fig:synthetic}. CobBO shows competitive or better performance.
It finds the best optima on Ackley and Levy among all the algorithms and outperforms the others for the difficult Rastrigin function. 
Notably, BADS is more suitable for low dimensions, as commented in~\cite{luigi2017}. Its performance is close to CobBO except for Rastrigin. 
%TuRBO performs better than TPE and worse than BADS. ATPE outperforms TPE. 
%CMA-ES eventually catches up with TPE, ATPE and REMBO on Ackley.
%REMBO appears unstable with large variations and is trapped in local optima. 
\begin{figure}[!ht]%\vspace{-2mm}
  \centering
  \includegraphics[width=0.95
  \linewidth,height=!]{figures/low_v.png}\vspace{-3mm}
%   \includegraphics{synthetic.png}\vspace{-3mm}
  \caption{Low dimensional problems: Ackley (left), Levy (middle) and Rastrigin (right)}
  \label{fig:synthetic}\vspace{0mm}
\end{figure}

In Fig.~\ref{fig:micha} we show that CobBO also optimizes well the Michalewicz function on $10$ dimensions, although it has symmetric bumps, where certain subspaces pass through a point in a symmetrical manner and others break it. 
\begin{figure}[!htb]
  \centering
    \includegraphics[width=0.45\textwidth]{ICML/michal.png}
     \caption{Performance over the low dimensional Michalewicz function with symmetrical and asymmetrical subspaces} 
    \label{fig:micha}
\end{figure}
Other real applications include parameter tuning for recommendation systems, database online performance tuning, and simulation based parameter optimization. However, due to deviating from the main study of this paper, we refrain from presenting these results that require elaborated description on the application backgrounds. 


\vspace{0.25cm}
\noindent \emph{Medium-sized synthetic black-box functions (minimization):}
We test three synthetic functions ($30$ dimensions), including Ackley on $[-5, 10]^{30}$, Levy $[-5, 10]^{30}$, and Rastrigin on $[-3, 4]^{30}$. In addition, 
we add experiments for an additive function of $36$ dimensions, defined as  $f_{36}(x)=\rm{Ackley}(x_1) + \rm{Levy}(x_2) + \rm{Rastrigin}(x_3) + \rm{Hartmann}(x_4)$, where the first three terms express the same functions over the same domains specified in Section~3.1 of this paper, with the Hartmann function over $[0, 1]^{6}$. 
TuRBO is configured identically the same as in Section~3.1, with a batch size of 10 and 5 trust regions with 10 initial points each. The other algorithms use 20 initial points.
The results are shown in Fig.~\ref{fig:30D-tests} and~\ref{fig:additive-36D}, where CobBO shows competitive or better performance compared to all of the methods tested across all of these problems.
 \begin{figure}[!htb]\vspace{-0mm}
   \centering
   \includegraphics[width=0.8\columnwidth,height=!]{supplementary/additive-36D.png}%\vspace{1mm}
   \caption{Performance over an additive function of 36 dimensions}%\vspace{5mm}
   \label{fig:additive-36D}
 \end{figure}


 
 



%\vspace{-1.5mm}
\subsubsection{High dimensional tests:}
Since each experiment takes a long time, confidence intervals ($95\%$) over repeated $10$ independent experiments for each problem are presented.

\noindent \emph{Additive latent structure (minimization):}
As mentioned in 
%the related work
Section~\ref{sec:related_work}, additive latent structures have been exploited %for tackling challenges 
in high dimensions.
%which however incur a high computational cost~\cite{chaudhuri2019}.   %For $x=(x_1, x_2, x_3, x_4)$,  
We construct an additive function of $56$ dimensions, defined as $f_{56}(x) = \rm{Ackley}(x_1) + \rm{Levy}(x_2) + \rm{Rastrigin}(x_3) + \rm{Hartmann}(x_4) +\rm{Rosenbrock}(x_5)+\rm{Schwefel}(x_6)$, where the first three terms express the exact functions and domains described in Section~\ref{ss:lowDtest}, the Hartmann function on $[0, 1]^{6}$ and the Rosenbrock and Schwefel functions on $[-5,10]^{10}$ and $[-500,500]^{10}$, respectively. 

We compare CobBO with TPE, BADS, CMA-ES and TuRBO, each with $100$ initial points. 
% and a budget of 5,000 evaluations. 
Specifically, TuRBO is configured with $1$ trust region and a batch size $100$. 
ATPE is excluded as it takes more than 24 hours per run to finish. 
%The other algorithms have a budget of 10,000 evaluations for $f_{56}$. The experiment setup is the same as for $f_{36}$, 
%except that the batch size of TuRBO is set to 100. 
The results are shown in Fig.~\ref{fig:low_medium_high} (upper middle), where CobBO quickly finds the best solution among those.
Comparisons with REMBO~\cite{ziyuw2016} are presented in Appendix E.2.
%the algorithms tested.

%As shown in Fig.~\ref{fig:control-additive}, CobBO finds the best solutions for both $f_{36}$  and $f_{56}$. 
%BADS performs closely to CobBO. ATPE outperforms TPE, TuRBO and CMA-ES on $f_{36}$. 
%TuRBO surpasses TPE and CMA-ES on $f_{36}$ eventually, while TPE and CMA-ES converge faster than TuRBO on $f_{56}$.

% \begin{figure*}[htb]\vspace{-2mm}
% \begin{center}
%   \includegraphics[width=1.0\linewidth,height=!]{highDims.png}
% %   \includegraphics{highDims.png}
% \end{center}
%   \caption{Performance over medium-size dimensional problems: 36D (left) and 56D (middle) additive functions and the 60D rover trajectory planning (right)}%\vspace{-1.5mm}
%   \label{fig:highDims}
% \end{figure*}
% \begin{figure}[htb]\vspace{-2mm}
% \begin{center}
%   \includegraphics[width=0.75\columnwidth,height=!]{medium_v.png}\vspace{-3mm}
% %   \includegraphics{highDims.png}
% \end{center}
%   \caption{Performance over medium-size dimensional problems: 56D additive functions (upper) and the 60D rover trajectory planning (lower)}\vspace{-1.5mm}
%   \label{fig:highDims}
% \end{figure}

\noindent \emph{Rover trajectory planning (maximization):} 
This problem (60 dimensions) is introduced in~\cite{wang18aistats}. 
The objective is to find a collision-avoiding trajectory of a sequence consisting of 30 positions in a 2-D plane. 
%$[0,1]^{2}$. 
We compare CobBO with TuRBO, TPE and CMA-ES, each with a budget of $20,000$ evaluations and $200$ initial points. 
TuRBO is configured with $15$ trust regions and a batch size of $100$, as in~\cite{turbo2019}. 
ATPE, BADS and REMBO are excluded for this problem and the following ones, as they all take more than 24 hours per run. 
%The results are shown in 
Fig.~\ref{fig:low_medium_high} (lower middle) shows that CobBO has a good performance. 
%quickly reaches the best solution.
%among the tested algorithms faster than TuRBO, while TPE and CMA-ES reach inferior solutions.

\noindent \emph{The 100-dimensional Levy and Rastrigin functions (minimization):} %\label{sec:100d}
We minimize the Levy and Rastrigin functions on $[-5, 10]^{100}$ with $500$ initial points. 
TuRBO is configured with $15$ trust regions and a batch size of $100$.
As commented in~\cite{turbo2019}, these two problems are challenging and have no redundant dimensions. 
 Fig.~\ref{fig:100d} (left) shows that CobBO can greatly reduce the trial complexity. 
 For Levy, it finds solutions close to the final one within $1,000$ trials, and eventually reach the best solution among all the algorithms tested.
 For Rastrigin, within $2,000$ trials CobBO and TuRBO surpass the final solutions of all the other methods, eventually with a large margin.
 
 %\clearpage
  \begin{figure}[!htb]
   \centering
   \includegraphics[width=0.8\columnwidth,height=!]{figures/synth_100d.pdf}
   \caption{Performance over high dimensional synthetic problems: Levy (left) and Rastrigin (right)}
   \label{fig:100d} %\vspace{5mm}
 \end{figure}
 
  For high dimensional problems, REMBO is too slow for almost all of the tested problems and gives bad results. For Ackley 200D with $4000$ iterations, REMBO and CobBO reach the mean best values of $15.1$ and $3.8$, respectively, running for $31.2$ and $3.4$ hours, respectively. %With the effective dimension of REMBO set to $20$ (similarly to CobBO, whose average is about 15). 
  This shows that CobBO outperforms REMBO by a large margin, and requires about $10\%$ of the computation time for this experiment.


%\vspace{0.5mm}
\noindent \emph{The 200-dimensional Levy and Ackley functions (minimization):}
We minimize the Levy and Ackley functions over $[-5, 10]^{200}$ with $500$ initial points. 
TuRBO-1 is configured with $1$ trust region and a batch size of $100$.
These two problems are challenging and have no redundant dimensions. 
 For Levy, in Fig.~\ref{fig:low_medium_high} (upper right), CobBO reaches $100$ within $2,000$ trials, while CMA-ES and TuRBO 
 obtain $200$ after $8,000$ trials. TPE cannot find a comparable solution within $10,000$ trials in this case. 
 For Ackley, in Fig.~\ref{fig:low_medium_high} (lower right), TuRBO, CMA-ES and CobBO converge to the mean best values of $4.53$, $3.33$ and $2.91$ respectively after $20,000$ trials. To be consistent with Levy, we present the first $10,000$ steps that also highlight the effectiveness of CobBO at relatively low query budgets for high dimensional functions. %CobBO reaches the best solution among all of the algorithms tested. 
%  The appealing trial complexity of CobBO suggests that it can be applied in a hybrid method, e.g., used in the first stage of the query process combined with gradient estimation methods or CMA-ES.

Regarding running times, for a fair comparison, we change the configure so that both TurBO and CobBO have the same batch size of $1$. For Ackley, CobBO runs for $12.8$ CPU hours and TuRBO-1 runs for more than $80$ CPU hours or $9.6$ \emph{GPU} hours. Other methods either cannot make any progress or find far worse solutions.


\subsubsection{Comparison to LineBO}\label{ss:linebo}
 Although sharing some common basic ideas, LineBO~\cite{linebo} reduces the acquisition maximization cost by restricting on a line but does not reduce the expensive computational costs of the GP regression in the full space. Fig.~\ref{fig:linebo} shows that LineBO is significantly outperformed by CobBO through a typical example in $D=10$ (Ackely). 
\begin{figure}[ht]
    \centering
    \includegraphics[width=0.45\textwidth, height=!]{ICML/linebo_ackley}
   \caption{A typical example of CobBO outperforming different variants of LineBO}
   \label{fig:linebo}
\end{figure}
In another typical experiment of $D=30$ and a query budget of 5000, CobBO reached 0.12 and LineBO reached 7.6. 

\subsubsection{Comparison to ALEBO}\label{ss:alebo}
ALEBO~\cite{letham2020} is designed for high-dimensional (large $D$) problems with low intrinsic dimensions (small $d$). For comparison, we first test CobBO using exactly the same setting as in~\cite{letham2020} for Hartmann6 with $D=1000$ dimensions and only $d=6$ intrinsic dimensions, as shown in Fig.~\ref{fig:hart6}. 
%We also test on problems without the assumption on
%such sparsities or a priori knowledge about the exact 
Then, for the general problems without the assumption on low intrinsic dimensions, we test ALEBO on Ackley(10D) in three sets of experiments in Fig.~\ref{fig:alebo}, where $D=d=10$. Since ALEBO algorithm requires to provide a low intrinsic dimension $d<D$, we test $d=2,4,8$ dimensions (i.e., ALEBO-2, ALEBO-4, and ALEBO-8), respectively. 
Fig.~\ref{fig:hart6} and~\ref{fig:alebo} show
the final results by repeating each experiment $30$ times.


\begin{figure}[ht]
    \centering
    \includegraphics[width=0.55\textwidth, height=!]{supplementary/hart1000.pdf}
  \caption{Performance on Hartmann6 ($D=1000$, $d=6$)}
  \label{fig:hart6}
\end{figure}
For the first case, ALEBO indeed outperforms CobBO, since CobBO is not designed for a function with a low effective dimension, i.e.,  $f(x) = g(\Phi x)$ for a function $g(\cdot)$ and a matrix $\Phi$ of $d\times D, d<<D$, which essentially assumes that $f(x)$ does not change along certain directions. 


\begin{figure}[ht]
    \centering
    \includegraphics[width=0.32\textwidth, height=!]{supplementary/ackley10-alebo-2.pdf}
    \includegraphics[width=0.32\textwidth, height=!]{supplementary/ackley10-alebo-4.pdf}
    \includegraphics[width=0.32\textwidth, height=!]{supplementary/ackley10-alebo-8.pdf}
  \caption{Compare ALEBO and CobBO on Ackley(10D)}
  \label{fig:alebo}
\end{figure}
%However, it takes ALEBO $12$ hours on average and only $10$ minutes for CobBO to finish the experiment.
For the second case, ALEBO does not show good performance and is outperformed by CobBO, TurBO and CMAES. 
% Regarding the computation times, it takes $6$ to $12$ hours for ALEBO and only $3$ minutes for CobBO to finish $500$ queries for each experiment on our testbed for the second case. 





 %Confidence intervals ($95\%$) are computed by repeating 10 independent experiments for each problem, as shown in Fig.~\ref{fig:200d}.
%  \begin{figure*}[htb]\vspace{-2mm}
%   \centering
%   \includegraphics[width=0.98\linewidth,height=!]{levy200d.png}\vspace{-3mm}
% %   \includegraphics{levy200d.png}\vspace{-3mm}
% \caption{Performance over high dimensional problems: the 200D Levy (left) and Ackley (middle) functions and the 102D half-cheetah control problem (right)}\vspace{-1.5mm}
%   \label{fig:200d}
%  \end{figure*}
%  \begin{figure}[htb]\vspace{-2mm}
%   \centering
%   \includegraphics[width=0.75\columnwidth,height=!]{high_v.png}\vspace{-3mm}
% %   \includegraphics{levy200d.png}\vspace{-3mm}
% \caption{Performance over high dimensional problems: the 200D Levy (upper) and Ackley (upper) functions}\vspace{-5mm}
%   \label{fig:200d}
%  \end{figure}


%\textbf{Half-cheetah control problem (maximization):}
%This is a model-free reinforcement learning problem ($102$ dimensions) provided by OpenAI gym~\cite{cheetah}.
% to  maximize the accumulated rewards.
%It has been shown that Augmented Random Search~\cite{mania2018,ars}, a random search method based on gradient estimation, in conjunction with a linear control policy, can achieve state-of-the-art sample efficiency and a competitive performance. 
% We apply the linear control policy as in~\cite{mania2018,ars}, governed by 102 unknown parameters to be searched over $[-0.1,0.1]^{102}$. The results in Fig.~\ref{fig:200d} demonstrate that Bayesian optimization can also efficiently find comparable solutions.
% \niv{Comparable to what ? what is the baseline score ?}
 





 


