\section{Additional Experiments} \label{sec:additional_exps}
Here, we provide additional experiment results on \algname.

\subsection{Robustness to Choices of $\beta$}
\begin{figure*}
    \centering
        {
        % Answer: [trim={left bottom right top},clip]
      \includegraphics[trim={8cm 0cm 8cm 2cm}, width=.8\textwidth]{./fig/simple_regret_fbeta_iclr.pdf}
    }
  \caption{The figure illustrates the simple regret for a different choice of constant $\beta$ for \algname. Here the theoretical $\beta$ are 6.51 for Rastrigin-1D-1C, 6.47 for Ackley-5D-2C, and 6.51 for Converter-36D-3C. The results are collected from 15 independent trials.
  }   
  \label{fig:exps:scan_beta}
  \end{figure*}

  {
    As is shown in \figref{fig:exps:scan_beta}, the algorithm is robust to moderate values of $\beta$. Except from the Ackley $\beta=0.1$ where the filtering of ROI is over-aggressive and traps the model on a certain locality when a very small number of candidates remain in ROI. We observe that certain $\beta$ choices could be slightly better but don't impact the convergence and lack statistical significance. We believe the acquisitions in \eqnref{eq:acqC} and \eqnref{eq:acqF}, together with the $\roi$ identification when the models are well-fitted, contribute to this robustness. Different from conventional GP-UCB \citep{srinivas2009gaussian}, the acquisition functions are standardized with the (maximum) lower confidence bound. The search domains are filtered when historical observations suggest poor performance in nearby areas.
  }

\subsection{Wall Time}%\secref{sec: exps}}

{We show the wall time of \algname compared with the baselines in \tabref{table: walltime}. The results demonstrate the efficiency of \algname due to the ROI filtering reducing the search space, though the ROI identification incurs additional cost for membership check.  }

\begin{table}
  \centering
  \begin{tabular}{lcccc}
      \toprule
      \textbf{Problem} & \textbf{\algname} & \textbf{CMES-IBO} & \textbf{SCBO} & \textbf{cEI} \\
      \midrule
      Rastrigin-1D-1C & 144.29 & 545.83 & 32.39 & 231.12 \\
      Ackley-5D-2C & 96.19 & 565.10 & 25.43 & 180.39 \\
      Converter-36D-3C & 190.05 & 660.27 & 31.73 & 267.36 \\
      \bottomrule
  \end{tabular}
  \caption{Average wall time (sec) of different CBO Methods collected from 15 independent trials.}\label{table: walltime}
\end{table}

\begin{figure*}
  \centering
    {
    \includegraphics[trim={8cm 3cm 7cm 3cm}, width=1.0\textwidth]
    {./fig/simple_regret_scan_aaai.pdf}
    }
    \caption{We use black dots and purple dots to show the infeasible region and feasible region in the first row correspondingly. Each column corresponds to a certain threshold choice for the single constraint $c(\instance) = |\instance+0.7|^{1/2}$ in the Rastrigin-1D-1C task. The search space contains a certain portion of the feasible region, denoted on each figure and title. The first row shows the distribution of 1000 samples from the noise-free distribution objective function, and the figures are differentiated with different feasible regions. The second row shows corresponding simple regret curves. We test each method with 15 independent trails and impose observation noises sampled from $\normal{(0, 0.1)}$ not shown in the first row. The scaling and length scale of the GPs are learned via maximum likelihood estimation.
    }  \label{fig:exps:scan_config}
\end{figure*}

\begin{figure*}
  \centering
      {
        \includegraphics[trim={4.3cm 0cm 4cm 1cm}, width=.7\textwidth]{./fig/simple_regret_uai.pdf}
  }
\caption{The input dimensionality, the number of constraints, and the approximate portion of the feasible region in the whole search space for each task are denoted on the titles. The curves show the average simple regret after standardization, while the shaded area denotes the 95\% confidence interval through the optimization.
}   
\label{fig:exps:all_res_config}

\end{figure*}


\subsection{Additional Comparison with CONFIG}\label{sec:config}
Though the objective is defined differently, we add additional baseline CONFIG from \cite{xu2023constrained}. The results are shown in \figref{fig:exps:scan_config} and \figref{fig:exps:all_res_config}. We observe that \algname outperforms or at least matches CONFIG in all the problems in our setting. Specifically, in the early stage of the Rastrigin-1D-1C task and through the Ackley-5D-2C, \revise{where the underlying objective is highly fluctuating, as is shown in \figref{fig:exps:scan_config} for Rastrigin-1D-1C, CONFIG fails to enter the feasible region consistently even after exhausting sufficient budget and gets stuck in learning the constraints passively.}

\revise{At the same time, we observe that on Converter-32D-3C, Vessel-4D-3C, and Spring-3D-6C, CONFIG generally matches the performance of \algname. We hypothesize that in these applications, the constraint learning part of \algname is not as beneficial as directly optimizing the underlying function is possibly feasible regions, as the unknown feasibility coincides with the optimality of the underlying objectives. Still, \algname bears higher consistency in all the benchmarks, highlighting the efficiency and necessity of the adaptive trade-off of active learning and optimization in \algname when assuming no reward is incurred outside the feasible region.} This difference also highlights the necessity of actively learning the complex underlying constraints to guarantee a stable convergence to a feasible optimum.

\subsection{Additional Comparison with SVM-CBO}
SVM-CBO \citep{antonio2021sequential} offers a practicality-oriented solution. It uses the SVM to learn the feasibility bound to estimate the decision boundary efficiently. The challenge of analyzing the learning of SVM combined with the coverage-oriented first-phase acquisition function poses a challenge to regret analysis. In addition, SVM-CBO requires a specific split of feasibility identification and optimization within the feasible region and demands different performance metrics for evaluation. This split makes direct comparisons with \algname, which is somewhat challenging and does not explicitly split the two processes. Nonetheless, we follow the practice in the paper that uses a 10:60:30 split for the random sampling, phase 1 and phase 2 of SVM-CBO, and report the simple regret.
\begin{table}[ht]
\centering
\resizebox{\textwidth}{!}{%
\begin{tabular}{|c|c|c|c|c|c|}
\hline
Experiment & COBAR-70 & CMES-IBO-70 & cEI-70 & SCBO-70 & SVM-CBO-70 \\
\hline
Rastrigin-1D-1C-60\% & 3.80e+00 (1.85e+00) & \textbf{3.00e+00 (1.84e+00)} & 1.08e+01 (3.02e+00) & 4.83e+00 (2.12e+00) & 4.83e+00 (1.43e+00) \\
Ackley-5D-2C-14\% & \textbf{3.71e-02 (7.19e-04)} & 4.41e-02 (6.67e-03) & 7.94e-02 (1.46e-02) & 1.43e-01 (0.00e+00) & 1.11e-01 (2.45e-02) \\
Converter-36D-3C-27\% & \textbf{9.50e-01 (1.77e-01)} & 1.32e+00 (2.31e-01) & 2.23e+00 (0.00e+00) & 2.23e+00 (0.00e+00) & 1.09e+00 (1.52e-01) \\
Vessel-4D-3C-78\% & \textbf{2.06e-02 (1.39e-02)} & 1.20e+00 (3.90e-01) & 2.44e-01 (2.48e-01) & 3.79e+00 (1.25e+00) & 2.59e-02 (3.36e-02) \\
Car\_Cabin-7D-8C-13\% & 8.62e+00 (3.30e+00) & 1.40e+01 (2.14e+00) & 6.15e+00 (2.28e+00) & 5.75e+00 (2.03e+00) & \textbf{6.84e+00 (3.24e+00)} \\
Spring-3D-6C-0.38\% & \textbf{6.40e+01 (2.83e+01)} & 1.11e+02 (1.50e+01) & 1.11e+02 (1.50e+01) & 1.11e+02 (1.50e+01) & 8.35e+01 (2.76e+01) \\
\hline
\hline
Experiment & COBAR-100 & CMES-IBO-100 & cEI-100 & SCBO-100 & SVM-CBO-100 \\
\hline
Rastrigin-1D-1C-60\% & \textbf{2.21e+00 (1.41e+00)} & 2.84e+00 (1.62e+00) & 1.07e+01 (3.07e+00) & 4.83e+00 (2.12e+00) & 2.67e+00 (8.14e-01) \\
Ackley-5D-2C-14\% & 3.69e-02 (3.08e-03) & \textbf{3.56e-02 (5.93e-03)} & 5.88e-02 (7.24e-03) & 1.43e-01 (0.00e+00) & 1.09e-01 (2.64e-02) \\
Converter-36D-3C-27\% & \textbf{9.29e-01 (1.27e-01)} & 1.22e+00 (2.02e-01) & 2.14e+00 (1.75e-01) & 2.23e+00 (0.00e+00) & 9.73e-01 (1.45e-01) \\
Vessel-4D-3C-78\% & \textbf{1.94e-02 (1.43e-02)} & 6.48e-01 (3.60e-01) & 1.51e-01 (1.25e-01) & 3.79e+00 (1.25e+00) & 2.23e-02 (8.96e-04) \\
Car\_Cabin-7D-8C-13\% & 6.40e+00 (2.72e+00) & 1.12e+01 (2.71e+00) & 5.92e+00 (2.34e+00) & 5.75e+00 (2.03e+00) & \textbf{6.03e+00 (2.40e+00)} \\
Spring-3D-6C-0.38\% & \textbf{5.60e+01 (2.99e+01)} & 1.11e+02 (1.50e+01) & 1.11e+02 (1.50e+01) & 1.11e+02 (1.50e+01) & 8.35e+01 (2.76e+01) \\
\hline
\end{tabular}
}
\caption{Comparison of different methods' simple regrets across experiments. The table shows the updated experiment results after incorporating the SVM-CBO as an additional baseline. The upper block shows the simple regret at 70 iterations, while the lower shows the simple regret at 100 iterations. The standard error is shown in parentheses.}
\label{table:experiments-SVM-CBO}
\end{table}

\Tabref{table:experiments-SVM-CBO} shows the simple regret of the end of both phases of SVM-CBO. We emphasize the best simple regret achieved. The results demonstrate that COBAR ultimately outperforms or matches the best baseline in the end. 

To provide a clearer, high-level quantitative summary, we aggregate the final performance of the core methods from \Tabref{table:experiments-SVM-CBO}. \Tabref{tab:average_rank} shows the average rank of each method based on its mean simple regret at the final iteration (T=100) across all six problems. As the summary shows, \algname (COBAR) achieves the best overall rank, confirming its strong and consistent performance.

\begin{table}[ht]
\centering
\caption{Average method ranking (lower is better) at the final iteration (T=100). Ranks are calculated for each of the 6 problems based on the mean simple regret data in \Tabref{table:experiments-SVM-CBO} and then averaged. SVM-CBO is excluded from the ranking to ensure a fair comparison with the core baselines from the main paper.}
\label{tab:average_rank}
\begin{tabular}{lcc}
\toprule
\textbf{Method} & \textbf{Average Rank} & \textbf{Overall Rank} \\
\midrule
COBAR    & $\sim 1.17$ & 1st \\
CMES-IBO & $\sim 2.33$ & 2nd \\
cEI      & $\sim 3.17$ & 3rd \\
SCBO     & $\sim 3.33$ & 4th \\
\bottomrule
\end{tabular}
\end{table}





