\section{Problem Statement}

In this section, we introduce a few useful notations and formalize the problem. Consider a compact search space $\searchSpace\subseteq\mathbb{R}^d$. We aim to find a maximizer $\instance^*\in\argmax_{\instance\in\searchSpace}f(\instance)$ of a black-box function $f:\searchSpace\rightarrow\mathbb{R}$, subject to $\conNum$ black-box constraints $\cFunc_m(\instance)$ ($\conIdx\in\conSpace =\{1,2,3,..., \conNum\}$) such that each constraint is satisfied by staying above its corresponding threshold $h_\conIdx$.
For simplicity and without loss of generality, 
we let all $h_\conIdx = 0$.
Thus, formally, our goal can be formulated as finding the \emph{interior optimum}:
    $$\max_{\instance\in\searchSpace}f(\instance)\text{~s.t.~} {\cFunc_\conIdx(\instance)> 0}, \forall \conIdx\in\conSpace$$

We maintain a Gaussian process ($\GP$)
as the surrogate model for each black-box function, pick a point $\instance_t\in\searchSpace$ at iteration $t$ by maximizing the acquisition function $\alpha: \searchSpace \rightarrow \reals$, 
and observe the function values perturbed by additive noise: 
$y_{\globalf,t} = \globalf(\instance_t) + \epsilon$ and $y_{\cFunc_\conIdx,t} = \cFunc_\conIdx(\instance_t) + \epsilon$, with $\epsilon \sim \mathcal{N}(0, \sigma^2)$ 
being i.i.d. Gaussian noise.
Each $\GP(\mu(\instance), k(\instance, \instance'))$ is fully specified by its prior mean $\mu$ and kernel $k$. With the historical observations $\Selected_{t-1} = \{(\instance_i, y_{\globalf, i}, \{y_{\cFunc_\conIdx, i}\}_{\conIdx\in\conSpace})\}_{i=1,2,...t-1}$, the posterior also takes the form of a \GP, with mean 
\begin{equation}\label{eq:posterior_mean}
\mu_{t}(\instance) = k_{t}(\instance)^\top(\GramMat_{t}+\sigma^2I)^{-1}\by_t    
\end{equation}
and covariance 
\begin{equation} \label{eq:posterior_covar}
k_{t}(\instance, \instance') = k_{}(\instance,\instance')-k_{t}(\instance)^\top(\GramMat_{t}+\sigma^2I)^{-1}k_{t}(\instance')    
\end{equation}
where $k_{t}(\instance) \triangleq \bracket{k_{}(\instance_1, \instance),\dots, k_{}(\instance_t, \instance)}^\top$ and $\GramMat_{t} \triangleq \bracket{k_{}(\instance, \instance')}_{\instance,\instance' \in \Selected_{t-1}}$ is a positive definite kernel matrix \citep{rasmussen:williams:2006}.

The definition of reward plays an important role in analyzing online learning algorithms. Throughout the rest of the paper, we define the reward of CBO as the following and defer the detailed discussion of alternative reward choices to \appref{sec:reward}. 
\begin{align}\label{eq: reward}
\reward(\instance_t) = 
\begin{cases}
    y_{\globalf,t} & \text{if~}~\mathbb{I}(y_{\cFunc_\conIdx(\instance_t)} \geq  0) \textit{\quad}\forall \conIdx\in\conSpace\\
    -\inf &\text{o.w.}
\end{cases}
\end{align}


We want to locate the global maximizer efficiently 
$$\instance^* = \argmax_{\instance\in\searchSpace, \forall \conIdx\in\conSpace, \cFunc_\conIdx(\instance) > 0}{f(\instance)}$$ 
More specifically, we seek to establish an upper bound on the performance in terms of {expected regret at a certain time $t$, with respect to the distribution over $f$ at time $t$ given historical observation $\Selected_{t-1}$,}

$$\regret_t(\instance) \defeq \E_{f} \bracket{\reward(\instance^*) \mid \Selected_{t-1}} -  \E_{f}\bracket{\reward(\instance) \mid \Selected_{t-1}}$$

Formally, given a certain confidence level $\delta$ and constant $\epsilon_\globalf$, we want to guarantee that after using up certain budget $T$ dependent on $\delta$ and $\epsilon_\globalf$, we could achieve a high probability upper bound of the regret on the identified area $\roi$ which is the subset of $\searchSpace$:
$$
P\left(\max_{\instance\in\roi}\regret_T(\instance) \geq \epsilon_\globalf \right) \leq \delta.
$$

\begin{rem}\label{rem: interior}
    As discussed in the literature \citep{antonio2021sequential, donskoi1993partially, rudenko1994objective, sergeyev2007one, sacher2018classification, bachoc2020gaussian}, the reward of CBO is not defined outside the feasible region. This reward definition in \eqref{eq: reward}, along with both the aleatoric and epistemic uncertainties of the underlying black-box functions, necessitates excluding boundary candidates from the formulation to ensure the soundness of the CBO objective. 
    {For example, consider a scenario where $\instance^* = \argmax_{\instance\in\searchSpace, \forall \conIdx\in\conSpace, \cFunc_\conIdx(\instance) \geq 0}{f(\instance)}$ and $\exists \conIdx\in\conSpace, \cFunc_\conIdx(\instance^*) = 0$. In this case, the observation $y_{\cFunc_\conIdx, t} = \cFunc_\conIdx(\instance_t) + \epsilon$ at $\instance_t = \instance^*$ will be purely noise ($\epsilon$), with $\Pr{y_{\cFunc_\conIdx, t} < 0} = 0.5$. According to \eqref{eq: reward}, this results in $\Pr{\reward(\instance_t) = -\inf} \geq 0.5$, making it impossible to achieve a high-probability regret bound, as previously discussed.} 
    Therefore, we aim to find the \emph{interior optimum} of the black-box constrained objective.
\end{rem}

\begin{figure}[t]
    \centering
        {
            \includegraphics[trim={1.5cm 11cm 3cm 3.8cm}, width=.5\textwidth]{./fig/ICLR24_CBO_Pipeline_comp_iclr.pdf}
    }
    \caption{The pipeline of our proposed algorithm, \algname. 
    In the left box, we maintain Gaussian process surrogates for the unknown objective and each constraint. The dotted curve shows the actual function, the red curve is the predicted mean, and the shaded area is the confidence interval. 
    In the right box, we derive acquisition functions from each Gaussian process. The general acquisition function combines these, but only over specific Regions of Interest (ROIs). 
    The grey area in the acquisition plot represents a region excluded by our ROI identification, a process detailed in Section~\ref{sec:algorithm}. For a step-by-step visual breakdown of this filtering, please see \figref{fig:1D_illustration}.
    }    
  \label{fig:pipeline}
\end{figure}
