\section{Discussions}
Here, we offer additional explanation and discussion over \algname.
\subsection{Additional Explanation of \algname}\label{sec:add_alg}
For \algref{alg:main}, $\{\instance_{g_t,t}\}$ in line 11 are acquired in line 7 as $\instance_{\cFunc_\conIdx, t}$ or line 9 as $\instance_{f, t}$, since $\mathcal{G}$ is composed of $\cFunc_\conIdx$ and $\globalf$. Roughly speaking, we are taking $\argmax_{g, \instance}$, yet we avoid using such notation for two reasons. (1) the domain where equation 5 and equation 6 are maximized are different; (2) the domain for equation 6 could even be empty. Therefore, we are currently taking the $\argmax$ of equation 5 and equation 6 over different domains (if not empty) separately and then taking the $\argmax$ of the corresponding acquisition function values as in line 11. 

\subsection{On the Comparability of Acquisition Functions over Different Underlying Functions}
{Both the acquisition functions for optimizing the objective and active learning are confidence interval-based, which reflects the uncertainty and is intrinsically comparable. With \assref{apt: sample_gp} that the black-box underlying functions are samples from the corresponding GPs specified by the kernels, we use the kernels to capture the scaling of the different unknowns. Our analysis does not assume that the kernels are the same, meaning that the theoretical results hold when the objective and constraints are of different scales. This analysis converts the algorithm’s sensitivity to scale to the sensitivity to hyper-parameter misspecification. In our experiments, we report the results when following the standard practice of kernel learning \citep{rasmussen:williams:2006} for both the proposed algorithm and baselines, as is stated at the end of the caption of \figref{fig:exps:scan_res}.} {In summary, the compatibility is guaranteed by the properly specified kernel. Recent advancements in self-correcting BO \citep{hvarfner2024self} or BO with unknown hyperparameters\citep{berkenkamp2019no} propose various methods to address the challenge.}

\revise{With regard to the practical concern over why the analysis does not require normalizing the different acquisition functions, the answer is threefold. First, since the correlation between the constraints and the objective is unknown, it is possible that the objective, in general, is of a smaller scale but bears the highest gradient near the boundary, meaning that the general scale of functions does not offer a guarantee to normalize the near-boundary uncertainties. Second, using the ROI to constrain the acquisition helps exclude the useless uncertainty reduction as the ROI considers both the objective and the constraints. If constraints dominate the \algname acquisition, it suggests that the selected points remain likely to contain the global optimum as its objective does not have a high probability of being suboptimal. Such a query won't be wasted. A concrete example is illustrated in \figref{fig:1D_illustration}. Third, since we are assuming a universal upper bound for each constraint, the scale difference could make certain constraints dominant in the $\epsilon_\conIdx$. This could be addressed through the normalization of observations given prior knowledge.}

\subsection{Difference from Other Existing CBO Methods with No-regret Guarantee}
We briefly discuss the differences between \algname and the previous theoretical results in CBO. \citet{lu2022no} addresses equality constraints for instantaneous penalty-based regret. However, the reward formulation is different. \citet{lu2023no} offers theoretical results on cumulative regret and violations. Yet, they assume querying points out of the feasible region still yields rewards and consider the violation separately. 


In general, we are unaware that the existing CBO analysis results lead to a similar guarantee as in our work when assuming querying infeasible points does not yield a reward. One key difference is that with the active learning component and feasibility assumption, we could guarantee to query a feasible point that bears a reward converging to optimal value with the desired confidence. In our specific reward formulation, we regard such a guarantee and, therefore, the contribution in algorithm design and analysis as sufficiently different from the previous work, even when only focusing on the coupled setting.


\subsection{Empty Subsets of Search Space}
It is possible that certain subsets discussed in \secref{sec:algorithm} could be empty at a certain $t$ as a result of intersections. However, according to the assumptions in \secref{sec: analysis} and \lemref{lem: roi}, the properly chosen $\beta$ does not result in over-aggressive filtering with high probability. From this perspective, ROI $\roi$ is soundly defined. \algname is also robust to empty $U_{\cFunc_\conIdx, t}$. As shown in \algref{alg:main}, the domain where the acquisition functions defined in \eqref{eq:acqC} and \eqref{eq:acqF} are maximized allow empty $U_{\cFunc_\conIdx, t}$ for \algname to proceed.


\subsection{Limitations and Future Work}\label{sec:limitations}
The limitation of \algname includes (1) the inefficiency of identifying the ROIs due to the pointwise comparison in current implementation relying on discretization; (2) the lack of discussion over correlated unknowns, which are common in practice (e.g., two constraints are actually lower bound and upper bound of the same value). Though we briefly discuss and study corresponding scenarios, we expect the following work could improve the algorithm's effectiveness and the comprehensiveness of corresponding analysis accordingly.

