\section{Reward Function}\label{sec:reward}
\subsection{Reward choice 1: product of reward and feasibility}
The definition of reward plays an important role in online machine learning performance analysis. In the CBO setting, one possible definition of constrained reward derived from the constraint nature is $\reward(\instance) = \globalf(\instance)\prod_\conIdx \mathbb{I}_{\cFunc_\conIdx(\instance)\geq h_\conIdx}$ when assuming the $f(\instance) > 0$. Considering both the aleatoric and epistemic uncertainty on the constraints, we could transform the problem into finding the maximizer
\begin{align*}
    \argmax_{\instance\in\searchSpace}{\reward}(\instance) = \argmax_{\instance\in\searchSpace} f(\instance)\prod_\conIdx\Pr{Y_{\cFunc_\conIdx}(\instance)\geq h_\conIdx}
\end{align*}
Here $Y_{\cFunc_\conIdx}(\instance)$ denotes the observation of the constraint $\cFunc_\conIdx$ at $\instance$.


The problem with this product reward, on the one hand, is that it is likely to incur a Pareto front if we regard the problem as a multi-objective optimization where the objectives are composed of $f(\instance)$ and $\Pr{Y_{\cFunc_\conIdx}(\instance)\geq h_\conIdx}$. The multi-objective nature and resulting Pareto front indicate that the optimization could be more challenging to converge than the single-objective unconstrained BO problem, though the unique global optimum is not always expected there either. More critically, when the feasibility of reaching a certain threshold, we prefer to focus on optimizing the objective value rather than the product for the following reasons.

Firstly, the marginal gain on improving feasibility by increasing the value of the constraint function drops after the feasibility reaches 0.5 assuming it follows a Gaussian. Especially in the tail region, improving the feasibility and then the product of feasibility and objective value by optimizing the constraint function is prohibitively difficult.  

Secondly, in most real-world scenarios except for certain applications that focus on feasibility (where the feasibility should be treated as another objective and make it in nature a multi-objective optimization), the actual marginal gain, in general, increases the feasibility decay faster than the increase of objective value. (e.g., when choosing between doubling the feasibility from 0.25 to 0.5 or doubling the objective value drop from 25 to 50, we probably favor the former as 0.25, meaning it is unlikely to happen. However, when choosing between increasing feasibility from .8 to .9 or increasing the objective drop from 80 to 90, there would be no such clear preference.) Then, the user would possibly favor the gain on the objective function after the feasibility reaches a certain level. Therefore, we propose the following reward for constrained optimization tasks according to this insight.

\subsection{Reward choice 2: objective function after the feasibility reaching certain threshold}
Instead of defining the reward as the product of the objective value and feasibility, we have to look into the probabilistic constraints and distinguish the epistemic uncertainty and aleatoric uncertainty. First, when assuming the observation on the constraints are noise-free, namely $Y_{\cFunc_\conIdx}(\instance) = \cFunc_\conIdx(\instance)$, we could simply use the indicator function
${\mu_\conIdx}$ for each constraint to turn the feasibility function into an indicator function. This definition accommodates the scenarios where the infeasible region does not incur credible reward as discussed by \cite{sacher2018classification, bachoc2020gaussian} due to simulation failures

\begin{align}
    \reward(\instance) = 
    \begin{cases}
        f(\instance) \textit{\quad if \quad} \mathbb{I}(C_\conIdx(\instance) \geq h_\conIdx) \textit{\quad}\forall \conIdx\in\conSpace\\
        -inf \textit{\quad o.w}
    \end{cases}
\end{align}

Next, if the observation on the constraints is perturbed with a known Gaussian noise, namely  $Y_{\cFunc_\conIdx}(\instance) \sim \normal{(\cFunc_\conIdx(\instance), \sigma)}$, we could deal with the aleatoric uncertainty with a user-specific confidence level for each constraint $\chi_\conIdx \in (0,1)$, $\forall \conIdx\in\conSpace $. Then we could turn $\mathbb{I}(Y_{C_\conIdx}(\instance) \geq h_\conIdx)$ into probabilistic constraints following the definiation proposed by \citet{gelbart2014bayesian} and $$\Pr{Y_{C_\conIdx}(\instance) \geq h_\conIdx} \geq \chi_\conIdx$$ to explicitly deal with the aleatoric uncertainty. With the percentage point function (PPF), we could transform the probabilistic constraints into a deterministic constraint $\mathbb{I}(C_\conIdx(\instance) \geq \hat{h}_\conIdx)$  with $\hat{h}_\conIdx = \textit{PPF}(h_\conIdx, \sigma, \mu_\conIdx)$, meaning $\hat{h}$ is the $\chi_\conIdx$ percent point of a Gaussian distribution with $h_\conIdx$ and $\sigma$ as its mean and standard deviation. Hence, we could unify the form of rewards of noise-free and noisy observation on the constraints with the user-specified confidence levels. For simplicity and without loss of generalization, we stick to the definition in \eqref{eq: reward} and let all $\hat{h}_\conIdx = 0$.
