\section{Related work} \label{sec:related_work}
%Given the large body of work on Bayesian optimization~\cite{frazier2018,Brochu2010,shahriari2016}, 
%we summarize the most relevant ones on high dimensionality. 
%in the following topics: high dimensionality, 
%trust regions, 
%hybrid methods and batch sampling.
%In Section~\ref{s:exp}, we compare CobBO with selected and competitive algorithms. 
% For an overview of Bayesian optimization see.
%\textbf{High dimensionality:}
%To apply Bayesian optimization in high dimensions, 
Certain assumptions
are often imposed on the latent structure in high dimensions. 
Typical assumptions include low dimensional embedding and additive structures. Their advantages manifest on problems with a low effective dimension. 
However, these assumptions do not necessarily always hold in practice, e.g., for non-separable functions without redundant dimensions.

\noindent \textbf{Low dimensional embedding:} 
The function $f$ is assumed to have a low effective dimension~\cite{kushner1964,hemant2014}, e.g.,  $f(x) = g(\Phi x)$ for a function $g(\cdot)$ and a matrix $\Phi$ of $d\times D, d<<D$. It essentially assumes that $f(x)$ does not change along certain directions. More generally, a non-linear auto-encoder can also be utilized to find the embedding.
A variety of methods have been developed, including
random embedding~\cite{josip2013,ziyuw2016,chaudhuri2019,binois2019,letham2020},
Hashing-enhanced Subspace BO (HeSBO)~\cite{chaudhuri2019}, and Mahalanobis kernel %for linear embeddings 
ALEBO~\cite{letham2020}.
%DROPOUT~\cite{dropoutbo} and LineBO~\cite{linebo}.
%~\cite{ziyu2013,josip2013,ziyuw2016,chuliang2016,chaudhuri2019,miao2019,binois2019,letham2020},
%(e.g., REMBO~\cite{ziyuw2016}),  
%low-rank matrix recovery~\cite{josip2013,hemant2014},
%and learning subspaces by derivative information~\cite{josip2013,eriksson2018}.
%In contrast to existing work on subspace selections,
%LineBO which receives a special treatment in Appendix~\ref{ss:linebo}, 
%CobBO efficiently leverages all the observations in the whole space using the two-stage kernels and the stopping rule in each subspace for consecutive observations. 
%rather than only relying on limited observations in each coordinate subspace. 
%While the simple first-stage kernel involves all the coordinates in the computation, the second stage conducts both the GP regression and the learning of the length scales of the more sophisticated kernel in the lower dimensional subspaces. 
%DROPOUT selects the active coordinates at every iteration and fill in the remaining coordinates using some heuristic strategy. CobBO uses a simple kernel applied on the full space to estimate the function values of newly added “virtual points” on the subspace from all past data.
%rather than simply starting from scratch in each subspace. 
%The two-stage kernel Gaussian process regressions fully leverage the observations in the whole space rather than only relying on observations in each coordinate subspace. 
Since not all the real-world problems fit the low dimensional embedding structure, CobBO is designed to optimize functions without redundant dimensions. It exploits the subspace structure, independent of the dimensions.
Though the embedding-based algorithms and CobBO are based on different assumptions, REMBO~\cite{ziyuw2016} and ALEBO~\cite{letham2020} are compared with CobBO in Appendix~1.  
%As a result, it shows great performance in both high and low dimensions, 
%different from some algorithms that are more suitable for low dimensions, e.g., BADS~\cite{luigi2017}. 




%SI-BO~\cite{josip2013}

\noindent \textbf{Additive structure}:
%Sparse Gaussian processes~\cite{mitchell2016},
A decomposition assumption is often made by $f(x) = \sum_{i=1}^{k}f^{(i)}\left(x_{i} \right)$, with $x_i$
defined over low-dimensional components.  In this case, the effective dimensionality of the model is
the largest dimension among all additive groups~\cite{mutny2018}, which is usually small.  
The Gaussian process is structured as an additive model~\cite{elad2013,kandasamy2015}.
%e.g.,  projected-additive functions~\cite{chuliang2016}, ensemble Bayesian optimization (EBO)~\cite{wang18aistats}, latent additive structural kernel learning (HDBBO)~\cite{zi2017} and group additive models~\cite{kandasamy2015,chuliang2016}. 
%Though this method effectively reduces
%the time complexity, it has been reported that the accuracy is often slightly lower than a full GP~\cite{elad2013}.  
However, learning the unknown structure incurs a considerable computational cost~\cite{chaudhuri2019}, and is not always applicable for non-separable functions, for which CobBO can still be applied. 
% in these scenarios.   %As a variant of the block coordinate ascent method, 
%CobBO can be applied for non-separable functions.  

% \emph{Kernel methods:}
% Various kernels have been used for resolving the difficulties in high dimensions, 
% e.g., 
% a hierarchical Gaussian process model~\cite{chen2019hierarchical}, a cylindrical kernel~\cite{bock2018} and a compositional kernel~\cite{david2013}.
% CobBO can be integrated with other sophisticated methods~\cite{snoek2012, david2013, bock2018,chen2019hierarchical,marchuk1975,jones1998,srinivas2010,frazier2008,scott2011,ziw2017}, e.g., ATPE/TPE~\cite{TPE2011,ATPE}
% and SMAC~\cite{HutHooLey11-smac}.

% \textbf{Trust regions and space partitions:}
\noindent \textbf{Trust regions and subspaces:}
Trust region BO has been proven effective for high-dimensional problems.
%A typical pattern is to alternate between global and local search regions. 
Within the local trust regions, many efficient methods have been applied, e.g.,  local Gaussian models (TurBO~\cite{turbo2019}),  adaptive search on a mesh grid (BADS~\cite{luigi2017}) or quasi-Newton local optimization (BLOSSOM~\cite{McLeod2018OptimizationFA}). 
TurBO~\cite{turbo2019} uses Thompson sampling to allocate samples across multiple regions.
A related method is to use space partitions, e.g., LA-MCTS~\cite{Wang2020LearningSS} on a Monte Carlo tree search algorithm to learn efficient partitions. 
CobBO differs by selecting low dimensional subspaces and using two-stage kernels. Apart from the afore-mentioned works on axis-aligned subspaces~\cite{dropoutbo,Oliveira2018,moriconi2020,Eriksson2021}, 
 another closely related work is LineBO~\cite{linebo}. It significantly reduces the acquisition function optimization time by restricting on one-dimensional subspaces. However, as it uses a single kernel, it does not address the computational issues of the GP regression in the full space. 
 Furthermore, CobBO selects the block size as well as the coordinates therein by a multiplicative weights update method~\cite{sanjeev12} applied to the preference probability associated with each coordinate.
Thus, it samples more promising subspaces with higher probabilities. 
See Appendix~2 for the comparison.

%It can also incorporate trust regions, as shown in the Appendix.
%in the first-stage global approximation
% CobBO dynamically forms a variable-size trust region around the optimum of the already queried points, which can be switched to a different region to escape stagnant local optima. 

% \textbf{Hybrid methods:}
% Combining BO and other techniques yields hybrid approaches. %In addition to some of the above examples designed for trust region methods, others have been proposed. 
% Bayesian adaptive direct search (BADS)~\cite{luigi2017} alternates between local BO and grid search, which fits low dimensions, as commented in~\cite{luigi2017}. Gradients can be used with BO, e.g., derivative-enabled knowledge gradient (d-KG)~\cite{wujian2017}.  EGO-CMA~\cite{Hossein2015} is combined with CMA-ES~\cite{cmaes}.
% %Gaussian differential\cite{zhang2020scalable}, CMA~\cite{cmaes}
% %scalable hyperparameter optimization with lazy gaussian processes\cite{ram2020scalable}
% %In this regard, CobBO can be used to form other hybrid methods. 
% In this regard, CobBO can be viewed as a combination of block coordinate ascent and Bayesian optimization.
% %\niv{What can we say about CobBO with regard to hybrid methods ? Can CobBO be considered as a hybrid method ? What is its uniqueness among the hybrid methods family ? Is it complementary to some other hybrid methods ?}

% \textbf{Batch sampling:}  
% The leverage of parallel computation for BO requires a batch of queries at each iteration.
% Popular methods include, e.g., Batch Upper Confidence Bound (BUCB)~\cite{desautels14}, combining UCB and Pure Exploration by entropy reduction (UCB-PE)~\cite{emile2013}, local penalization~\cite{javier2016}, determinantal point processes~\cite{tarun2016}, Monte-Carlo simulation~\cite{javad2010}, sampling according to a reward function~\cite{desautels14} and using the reparameterization trick for acquisition functions~\cite{wilson2017reparameterization}. 
% These methods can be combined with CobBO. In addition, 
% %due to the sampling of coordinate subspaces, 
% CobBO can be paralleled in a batch mode by sampling multiple different subspaces simultaneously. %as detailed in Section~\ref{ss:batch}.  

%\cite{javad2010}The key idea is to exploit the availability of high-quality and efficient sequential policies, by using Monte-Carlo simulation to select input batches that closely match their expected behavior.
%\cite{desautels14}models the reward function as a sample from a Gaussian process and which can select batches of experiments to run in parallel
%\cite{emile2013}combines the UCB strategy and Pure Exploration in the same batch of evaluations along the parallel iterations. 
%remaining locations are selected via Pure Exploration restricted;  information gain about f by the locations. Formally, I pXq is the reduction of
%entropy when knowing the values of the observations Y at X

