

% \begin{figure*}[!ht]
% \begin{center}
%   \includegraphics[width=0.95\linewidth,height=!]{figures/ablation.png}
% %   \includegraphics{lunar-robot.png}
% \end{center}
%   \caption{Ablation study using Rastrigin on $[-5,10]^{50}$ with $20$ initial samples}
%   \label{fig:ablation}
% \end{figure*}
This section presents detailed ablation studies of the key components %presented in Section~\ref{sec:algorithm} 
and comparisons with other algorithms.
The specifications of the testbed are as follows: Intel(R) Xeon(R) CPU E5-2682 v4 2.50GHz, Memory 32GB, GPU NVIDIA Tesla P100 PCIe 16GB.


\subsection{Ablation study and empirical analysis}\label{ss:ablation}
Ablation studies are designed to study the contributions of the key components in 
Algorithm~\ref{alg:top} by experimenting with the Rastrigin function on $[-5,10]^{50}$ with 20 initial points. %The best performing run out of 5
Confidence intervals ($95\%$) over $10$ independent experiments for each 
configuration are presented in 
%Fig.~\ref{fig:upper_bound},
Fig.~\ref{fig:ablation_trio}.
%for $500$ queries.

\textbf{Coordinate blocks of a varying size: } 
%As a variant of coordinate ascent, 
CobBO selects a block of coordinates $C_t$ of a varying size, as described in Section~\ref{ss:block}. 
%The above ablation study in Fig.~\ref{fig:upper_bound}
While CobBO is robust to the upper bound of the block size $|C_t|$, as shown in Appendix~6, 
Fig.~\ref{fig:ablation_trio} (left) shows that a varying size is better than a fixed one. 
 Furthermore, although the average block size of CobBO is $15$ in this setting, it enjoys both the fast exploration of larger block sizes (e.g. $22$) and efficient exploitation of smaller block sizes (e.g. $6$).




\textbf{RBF interpolation in the first stage: }
%\textbf{RBF interpolation:} 
RBF calculation is time efficient, which is beneficial in high dimensions.
Fig.~\ref{fig:motivation} (left) shows the computation time of plain Bayesian optimization compared to CobBO's. While the former applies %GP regression using 
the Mat\'{e}rn kernel in the high dimensional space directly, the later applies RBF interpolation in the high dimensional space and %GP regression with 
the Mat\'{e}rn kernel in the low dimensional subspace. This two-stage kernel method leads to a significant speed-up. Other efficient alternatives are, e.g., the inverse distance weighting~\cite{idw} and the simple approach of assigning the value of the observed nearest neighbour.  
Fig.~\ref{fig:ablation_trio} (middle) shows that RBF is more favorable.

\textbf{Backoff stopping rule: } CobBO applies a stopping rule to query a variable number of points in subspace $\Omega_t$ (Section~\ref{ss:backoff}).
%which balances between accurate observations and inaccurate estimations in $\Omega_t$. 
To validate its effectiveness, we compare it with schemes that use a fixed budget of queries for $\Omega_t$. Fig.~\ref{fig:ablation_trio} (right) shows that the stopping rule yields superior results. Specifically, it enjoys both fast exploration of small query budget in each subspace (e.g. $1$,$2$) and efficient exploitation of large ones (e.g. $16$). Note that for different problems the best fixed number of consistent queries vary but the backoff stopping rule can adaptively achieve a good performance. 
%better than others with fixed query budgets. 





% \textbf{Forming trust regions on two different time scales:}
% CobBO alternates between coarse and fine trust regions on slow and fast time scales, respectively (Section~\ref{ss:2tr}). 
% Figure~\ref{fig:ablation} (c) compares CobBO with two other schemes: without any trust regions and forming only coarse trust regions. Two time scales show better results. 

% \textbf{Escaping trapped optima:} 
% %CobBO applies two methods to escape trapped local optima.
% Figure~\ref{fig:ablation} (d) shows that the way CobBO escapes local optima (Section~\ref{ss:escape}) by decreasing $M_{t-1}$ and setting $V_{t}$ as a selected random point is beneficial.

% \begin{figure}
%     \centering
% %   \includegraphics[width=0.4\textwidth,height=!]{figures/select_prob.pdf}
%   \includegraphics[width=0.5\textwidth,height=!]{figures/probs-50d.pdf}
%   \caption{The preference probability focuses on active coordinates}% as the entropy decreases}
%   \label{fig:select_prob}
% \end{figure}

% \begin{wrapfigure}{r}{0.5\textwidth}
% \begin{center}
%   \includegraphics[width=0.45\textwidth,height=!]{figures/select_prob.pdf}
% \end{center}
%   \caption{The preference probability focuses on active coordinates as the entropy decreases}
%   \label{fig:select_prob}
% \end{wrapfigure}

% \newpage
% % \begin{wrapfigure}{r}{0.5\textwidth}
% \begin{figure}
%     \centering
%   \includegraphics[width=0.45\textwidth,height=!]{figures/probs-50d.pdf}
%   \caption{The preference probability focuses on active coordinates}% as the entropy decreases}
%     % \caption{Active coordinates are better selected.}% as the entropy decreases}
%   \label{fig:select_prob}
% \end{figure}
% % \end{wrapfigure}
\textbf{Preference probability over coordinates: } For demonstrating the effectiveness of coordinate selection (Section~\ref{ss:block}), we artificially let the function value only depend on the first $25$ coordinates of its input and ignore the rest. It forms two separate sets of active and inactive coordinates, respectively. We expect CobBO to refrain from selecting inactive coordinates. Fig.~\ref{fig:select_prob} shows the overall preference probability $\pi_t$ for picking active ($\sum_{i=1}^{25}\pi_{t,i}$) and inactive coordinates ($\sum_{i=26}^{50}\pi_{t,i}$) at each iteration $t$. We see that the preference distribution concentrates on the active coordinates.
