\documentclass{article}

\usepackage{icml2021}
% If your paper is accepted, change the options for the package
% aistats2021 as follows:
%
%\usepackage[accepted]{aistats2021}
%
% This option will print headings for the title of your paper and
% headings for the authors names, plus a copyright note at the end of
% the first column of the first page.

% If you set papersize explicitly, activate the following three lines:
%\special{papersize = 8.5in, 11in}
%\setlength{\pdfpageheight}{11in}
%\setlength{\pdfpagewidth}{8.5in}

% If you use natbib package, activate the following three lines:
%\usepackage[round]{natbib}
%\renewcommand{\bibname}{References}
%\renewcommand{\bibsection}{\subsubsection*{\bibname}}

% If you use BibTeX in apalike style, activate the following line:
%\bibliographystyle{apalike}

\usepackage{makecell}
\usepackage{xcolor}
\usepackage{graphicx}
\graphicspath{ {figures/} }

\newcommand{\jian}[1]{\begin{center}\fbox{\parbox{3in}{{\textcolor{blue}{Jian: #1}}}}\end{center}}
\newcommand{\niv}[1]{\begin{center}\fbox{\parbox{3in}{{\textcolor{red}{Niv: #1}}}}\end{center}}

\begin{document}

% If your paper is accepted and the title of your paper is very long,
% the style will print as headings an error message. Use the following
% command to supply a shorter title of your paper so that it can be
% used as headings.
%
% \runningtitle{I use this title instead because the last one was very long}

% If your paper is accepted and the number of authors is large, the
% style will print as headings an error message. Use the following
% command to supply a shorter version of the authors names so that
% they can be used as headings (for example, use only the surnames)
%
%\runningauthor{Surname 1, Surname 2, Surname 3, ...., Surname n}

% Supplementary material: To improve readability, you must use a single-column format for the supplementary material.
\onecolumn
\icmltitle{CobBO: Supplementary Materials}
\section*{Auxiliary features of CobBO}\label{ss:auxiliary}
Further smoothness and acceleration can be achieved by filtering out clustered queried points, as alternating between adaptive trust regions promotes exploration in the interior of the domain and assists in escaping local optima.

%Computational growth with query budget}: 
The runtime of each iteration for Gaussian process regression scales cubically in the number of queried points. The computational complexity could grow prohibitively high and prevent the usage beyond a limited query budget. 
It is possible to bring the complexity down to be quadratic by carefully handling the Cholesky factorization~\cite{bayesopt,lazygaussian2020}, or even linear by assuming additive structures~\cite{mutny2018}. Nevertheless, these methods are not generally applicable for our purpose.
Instead, we resort to approximate Gaussian process regression~\cite{candela2005,bui2017}, using less points to describe the prior. 

\textbf{Data filtering by K-means classification:}
Dealing with the cubic computation cost in queries~\cite{snoek2012}, instead of using the sophisticated approximated Gaussian process regression~\cite{candela2005,bui2017}, above some quantity of aggregated observations, e.g. $1000$, we leverage the K-means algorithm~\cite{macqueen1967some} for discarding clustered points.
Specifically, we only keep the point of maximal value within each cluster. 
 Intuitively, if two nearby points have close function values,  discarding the smaller one for a maximization problem seems innocuous. Sometimes, it could even be better, since Bayesian optimization assumes the function $f(x)$ to be smooth, from a reproducing kernel Hilbert space~\cite{bull2011}.
 
 \textbf{Escaping stagnant local optima:} When the number of consecutive trials that fail to improve the optimization process, we temporary decreasing the function values around the stagnant local optima.  By doing so, the Gaussian process regression could encourage to explore other potentially more promising areas. 

\textbf{Batch queries:} %\label{ss:batch}
Due to sampling subspaces,  CobBO can be easily paralleled in a batch mode.  
Specifically, we can sample multiple coordinate subspaces, each containing the latest observed pivot point $V_t$. 
Since the batch mode does not require synchronization, multiple concurrent subspaces may not necessarily use an identical $V_t$.
In principle, we can integrate other batch methods~\cite{turbo2019,desautels14,emile2013,javier2016,tarun2016,javad2010,desautels14,wilson2017reparameterization} with CobBO.



\section*{Impacts of the key/auxiliary features on the performance}
CobBO is configured with default settings, including a stopping rule for determining the number of 
consistent queries and the strategies to form coarse and refined trust regions on slow and fast time scales, respectively. 
In order to compare the impact of the configurations, we test the following combinations. 
\begin{itemize}\vspace{-3mm}%\setlength\itemsep{0em}
\item consistent query $\in \{\rm{stopping\;rule}, \;\rm{fixed\; constant}\; q_{\rm{max}}\}$ %with $q_{\rm{max}}$ being the maximum number of consistent queries
\item $S \in\{\rm{true},\rm{false}\}$, whether or not to employ coarse trust regions on a slow time scale
\item $F \in\{\rm{true},\rm{false}\}$, whether or not to employ refined trust regions on a fast time scale
\end{itemize}\vspace{-3mm}
%
The fixed constant $q_{\rm{max}}$ represents the maximum number of consistent queries that can be continuously imposed to the 
currently selected coordinate subspace. 
It is a tradeoff between exploiting the potentials of the current coordinate subspace and exploring other new subspaces. 
%Conceptually, more consistent queries exploit the potential of the coordinate subspace, at the risk of missing better solutions of other subspaces due to the limited total budget. 
%
When coarse trust regions are enabled (i.e., $S=\rm{true}$) on a slow time scale, 
the procedure will exploit a neighborhood of $V_{t}$ instead of the full domain.  
%
If refined trust regions are used (i.e., $F=\rm{true}$) on a fast time scale, the alternation between coarse and refined trust regions
can help Bayesian optimization to better exploit the selected regions centering at~$V_{t}$. 
This alternation can help to distribute the new queries on both the centering area as well as the boundary areas. 
%
%Coarse trust regions can be considered as a trade-off between the refined small trust regions and the original domain.  
We use extensive experiments to empirically test how these features could contribute to the performance of CobBO. 
\vspace{-2mm}

We apply CobBO on 30 dimensional synthetic functions (Ackley, Levy and Rastrigin) and the robot pushing problem 
using $6$ different settings,  as shown in the following table:

\begin{table}[h]
%\caption{Table Caption}
\label{tab:settings}
\begin{center}
\begin{tabular}{lcccccc}
\hline
         &  $\rm{CobBO}^{\ast}$  & $\rm{CobBO}^{1}$ & $\rm{CobBO}^{2}$ & $\rm{CobBO}^{3}$ & $\rm{CobBO}^{4}$ & $\rm{CobBO}^{5}$ \\ 
\hline
$q_{\rm{max}}$    & stopping rule      &  stopping rule         & stopping rule        & stopping rule       &  1       & 15 \\
$S$  &true  &  false    & true   &  false  & true    & true  \\
$F$  & true  &  false    & false  &  true   &  true   & true   \\
\hline
\end{tabular}
\end{center}
\end{table}
Note that $\rm{CobBO}^{\ast}$ is the default settings that we use to generate the experimental results in the main part of this paper. 
Based on the previous setup, we assign a budget of $2,500$ evaluations to Ackley, Levy and Rastrigin, and $7,000$ evaluations 
to the robot pushing problem.
For each configuration, we plot 95\% confidence intervals by repeating 30 independent experiments for each problem.  
The tested value $q_{\rm{max}}$ is chosen to be $2$ for 2500 evaluations and $3$ for $7,000$ evaluations. 
\niv{But in the table it specifies $q_{max}\in\{1,15\}$}

\begin{figure}[hbt]
\begin{center}
\includegraphics[width=0.98\columnwidth,height=!]{app-synthetic-30.png}
% \includegraphics{app-synthetic-30.png}
\end{center}\vspace{-3mm}
\caption{Performance of different configurations on synthetic problems}
\label{fig:d30}
\end{figure}
\vspace{-2mm}

On these three synthetic problems, performance of the different configurations are close to each other,  as shown in Fig.~\ref{fig:d30}. 
It indicates that in these cases CobBO is not sensitive to these configurations. 
However, small differences still exist for the experiments. \vspace{-2mm}
%
%
%On average, CobBO(3), CobBO(4)  and CobBO(5) outperform the other two groups eventually on all three problems. 
%It implies that with fast trust regions CobBO tends to get better global performance.

%CobBO(1) performs better than CobBO(2) at early stages, while CobBO(2) eventually catches up with or even surpasses CobBO(1) . Similar patterns can also be found between CobBO(3) and CobBO(4).  With slow trust regions enabled, CobBO(2) and CobBO(4) are able to explore regions with finer granularity. This exploitation-inclined setting makes CobBO converge slowly in the early stage while achieve competitive solution qualities in the end. 
%
$\rm{CobBO}^{5}$ with a larger $q_{\rm{max}}$ value performs slightly worse than $\rm{CobBO}^{3}$ and  $\rm{CobBO}^{4}$, 
while better than  $\rm{CobBO}^{1}$ and  $\rm{CobBO}^{2}$. It implies that $q_{\rm{max}}$ and $F$ have stronger impacts on the performance than 
$S$ in these cases. 
%
With the fast trust region feature enabled ($F = \rm{true}$),   
$\rm{CobBO}^{3}$ encourages more exploitation within smaller neighborhoods around the current best solutions,  and 
consistently outperforms  $\rm{CobBO}^{1}$ and  $\rm{CobBO}^{2}$ on all three problems. \vspace{-2mm}

%It indicates that enabling the fast trust region feature, which encourages global exploration, contributes to better solution qualities in these cases.

 \begin{figure}[hbt]
\begin{center}
\includegraphics[width=0.6\columnwidth,height=!]{rpush.png}
% \includegraphics{rpush.png}
\end{center}\vspace{-3mm}
\caption{Performance of different configurations on the robot pushing problem}
\label{fig:push}
\end{figure}

For the robot pushing problem, as shown in Fig.~\ref{fig:push}, the results of the $6$ configurations are neither significantly different from each other. 
Specifically, $\rm{CobBO}^3$  slightly outperforms the other settings on average, similarly to the above experiments. 
$\rm{CobBO}^5$ performs badly, possibly due to its excessive exploitation in the current coordinate subspaces. 
Differently from the above cases, $\rm{CobBO}^1$ and $\rm{CobBO}^2$ find better solutions than  $\rm{CobBO}^4$ and $\rm{CobBO}^5$ on average. 
It suggests that restricting the procedure within refined trust regions may have a negative impact on the performance in this case.
\niv{How can we know for sure that $F$ is the source of this negative impact ? E.g.  $\rm{CobBO}^1$ and $\rm{CobBO}^2$ also differ from $\rm{CobBO}^4$ and $\rm{CobBO}^5$ in $q_{max}$. Actually  $\rm{CobBO}^3$ differs from $\rm{CobBO}^4$ and $\rm{CobBO}^5$ in $q_{max}$ (and also in $S$) rathar than in $F$ and is superioir to these as well. So my guess is that the negative impact is attributed to $q_{max}$ rather than to $F$.}
The default setting is configured with refined trust regions. Thus, it is not as good as  $\rm{CobBO}^1$ and $\rm{CobBO}^2$. 
It indicates that better adaptive algorithms can be designed to further improve the performance of CobBO. 

\section*{CobBO's default hyper-parameters configuration for all of the experiments}
\input{supplementary/parameters_table.tex}

\section*{Additional experiments}
In this section, we provide more experiments to demonstrate the performance of CobBO.

\subsection{Separable and additive functions}


\subsection{}

\section*{The small variance on the 200 dimensional Levy function }%\vspace{-5mm}

Fig. 5 in the manuscript cannot clearly show the variances of the sample paths of CobBO on the 200 dimensional Levy function. 
We zoom in the details and provide two partial views, as shown in Fig.~\ref{fig:200d-levy}.

 \begin{figure}[hbt]
\begin{center}
\includegraphics[width=0.6\columnwidth,height=!]{200d-large}
\end{center}\vspace{-9mm}
\caption{Zoom in to show the variances in Fig. 5}
\label{fig:200d-levy}
\end{figure}
\vspace{-2mm}

\end{document}