\section{Theoretical Results}
We summarize the required assumptions below, followed by the justification of each assumption.
\begin{assumption} \label{apt: sample_gp}
The utility functions $\utilF$ are sampled from corresponding mutually independent GP. That is, $\forall t \leq T, \instance\in\searchSpace, i\in\uSpace$, $\utilF(\instance)$ is a sample from global $\mathcal{GP}_{\utilF, t}$.
\end{assumption}
This assumption is commonly found in the literature, as demonstrated by references such as \cite{srinivas2009gaussian, gotovos2013active, zhang2023learning}.
\reviseFx{While devising a well-specified prior for the unknown function could be challenging in practice, there are recent advancements focusing on analyzing BO's behavior under prior misspecification \citep{bogunovic2021misspecified}, or proposing solutions for unknown hyperparameters specifying the prior \citep{berkenkamp2019no, hvarfner2024self}. Though this is a separate direction orthogonal to our work, we want to highlight the aforementioned challenge and potential for integrating existing solutions.
}
\begin{assumption} \label{apt: mono_ci}
    Given the horizon $T$, with a proper choice of constant $\beta$, the confidence intervals are well calibrated, meaning a later posterior would agree with the previous posteriors. Concretely, for all $u_i, i \in \uSpace$. That is, $\forall t_1 \leq t_2 \leq T, \instance\in\searchSpace,  i\in\uSpace$, we have $\UCBit_{\utilF, t_1}(\instance) \geq \UCBit_{\utilF, t_2}(\instance)$ and $\LCB_{\utilF, t_1}(\instance) \leq \LCB_{\utilF, t_2}(\instance)$.
\end{assumption}

This is a mild assumption given recent work by \citet{koepernik2021consistency} showing that if the kernel is continuous and the sequence of sampling points lies sufficiently dense, the variance of the posterior \GP converges to zero almost surely monotonically if the function is in metric space, and the posterior mean converges to the unknown function pointwise in $\mathbf{L}^2$ if the unknown function lies in the RKHS of the prior kernel.

If the assumption is violated, the technique of taking the intersection of all historical confidence intervals introduced by \citet{gotovos2013active} could similarly guarantee a monotonically shrinking confidence interval. That is, when $\exists t_1 \leq t_2 \leq T, \instance\in\searchSpace, i\in\uSpace$, if we have $\UCBit_{\utilF, t_1}(\instance) < \UCBit_{\utilF, t_2}(\instance)$ or $\LCB_{\utilF, t_1}(\instance) > \LCB_{\utilF, t_2}(\instance)$, we let $\UCBit_{\utilF, t_2}(\instance) = \UCBit_{\utilF, t_1}(\instance)$ or $\LCB_{\utilF, t_2}(\instance) = \LCB_{\utilF, t_1}(\instance)$ to guarantee  monotonocity. 

A direct result of the assumed monotonously on the confidence interval of $\utilF$ is the similar monotonicity on the confidence interval of $\partU$ and $\globalf$, and then the monotonical shrinking of ROI.
\begin{lemma}\label{lem: mono_ci}
With the \assref{apt: sample_gp} and \assref{apt: mono_ci}, $\forall t_1 \leq t_2 \leq T, \instance\in\searchSpace,  i\in\uSpace$, we have $\UCBit_{\partU, t_1}(\instance) \geq \UCBit_{\partU, t_2}(\instance)$ and $\LCB_{\partU, t_1}(\instance) \leq \LCB_{\partU, t_2}(\instance)$.  $\forall t_1 \leq t_2 \leq T, \instance\in\searchSpace$, we have $\UCBit_{\globalf, t_1}(\instance) \geq \UCBit_{\globalf, t_2}(\instance)$ and $\LCB_{\globalf, t_1}(\instance) \leq \LCB_{\globalf, t_2}(\instance)$, and therefore $\roi^t \subseteq \roi^{t-1}$.
\end{lemma}

First, we justify the definition of the confidence intervals, and therefore, the ROI identified does not lose the global optimum with a certain probability.

\begin{lemma}\label{lem: roi}
    With the assumptions above, the region(s) of interest $\{\roi^t\}_{t \in [T]}$ defined in \eqref{eq:roi} contains the global optimum with high probability. That is, for all $\delta \in (0,1)$, $\forall t\geq 1$, and any finite discretization $\discreteSet$ of $\searchSpace$ containing the optimum $\instance^* = \argmin_{\instance\in \searchSpace}f(\instance)$, with $\beta=2\log(\uSpaceNum \vert \discreteSet \vert T/ \delta)$, we have $\Pr{\instance^* \in \roi^t} \geq 1-\delta$.
\end{lemma}
Finally, we bound the simple regret of the proposed \algoref{alg:main}.
For clarity, we denote $\discreteROI = \discreteSet \cap \roi^t$, and $$CI_{\globalf^*, t } = [\min_{\instance \in \discreteROI}\LCB_{\globalf, t}(\instance),
\min_{\instance \in \discreteROI}\UCBit_{\globalf, t}(\instance)]$$

Let us define the maximum information gain about function $u$ after $T$ rounds:
\begin{equation}\label{eq:gammaT}
\maxInfo_{\utilF, T} = \max_{\actionSet\subset \discreteSet: \vert \actionSet \vert=T}{\mutualinfo{y_\actionSet; \utilF}}
\text{~~and~~}
    \widehat{\maxInfo_T} = \sum_{i \in \uSpace}{\maxInfo_{\utilF, T}}    
\end{equation}
Note that previous work by \cite{srinivas2009gaussian} that bounds the maximum information gain $\gamma$ corresponding to popular kernel to be sublinear.

Here, we justify that the proposed acquisition function reduces the width of the confidence interval of the global optimum efficiently.
\begin{theorem}\label{thm: simReg}
    The width of the resulting confidence interval of the global optimum $f^*=f(\instance^*)$ has an upper bound. That is, under the assumptions above, with a constant $\beta=2\log(\uSpaceNum\vert \discreteSet \vert T/ \delta)$, and $\instance^t = \argmax_{\instance \in \searchSpace} {\acqF(\instance, \searchSpace)}$,  after at most $T \geq \frac{\beta \maxInfoARISE_T \hat{C}_1}{\epsilon^2}$ iterations, we have $$\Pr{\vert CI_{\globalf^*, T}\vert \leq \epsilon, \globalf^* \in CI_{\globalf^*, T }} \geq 1 - \delta$$
    Here $\hat{C}_1=8(\uSpaceNum+1)^2/\log(1+\sigma^{-2})$. 
\end{theorem}

The result above shows that when the proposed acquisition function is maximized in the global search space, it achieves efficient learning. However, to reach a balance of exploration and exploitation so that the algorithm identifies the global optimum along with the learning with high probability, we need to restrict the search space to the ROI, which achieves the exploitation by design.

The following results show that, when combining the results above, since the Nash-Equilibrium exists, and the points of ROI are sufficiently close to $\instance^*$, we have with probability at least $1-\delta$ that \algname achieves $\epsilon$-Nash Equilibrium.

\begin{theorem}\label{thm: eNE}
We assume the aforementioned assumptions hold. We apply the same $\beta$ and the acquisition function as illustrated in \algoref{alg:main}. In addition, we assume after $T\geq \frac{\beta \maxInfoARISE_T \hat{C}_1}{\epsilon^2}$ iterations, when $\forall \instance \in \discreteROI$, it holds that $\UCB_{\utilF,t}(\minusIX, \discreteROI) = \UCB_{\utilF,t}(\minusIX, \discreteSet)$ and $\LCB_{\utilF,t}(\minusIX, \discreteROI) = \LCB_{\utilF,t}(\minusIX, \discreteSet)$, we have $$\Pr{\globalf(\instance^T)\leq \sqrt{\frac{\beta \maxInfoARISE_T \hat{C}_1}{T}}\leq \epsilon} \geq 1 - \delta$$
Here $\hat{C}_1=8(\uSpaceNum+1)^2/\log(1+\sigma^{-2})$.
\end{theorem}

\begin{remark}
    The additional assumption made above in \thmref{thm: eNE} is mild, as it is satisfied when the points in ROI are sufficiently close to the global optimum. This allows that they resemble the Nash Equilibria's property, that is, the partial maximum of the utility functions is identical to $\instance$ when $\globalf(\instance)=0$. More formally, when $\instance \in \discreteROI$ converges to $\instance^*$ where $\globalf(\instance^*) = 0$, the partial maximum $\argmax_{\instance \in \searchSpace} \partU(\minusIX)$ also converges to points in ROI.
\end{remark}

Given that $\maxInfoARISE_T$ and $\beta$ are sublinear to $T$, $\hat{C}_1$ is a constant, the result above shows that the proposed \algoref{alg:main} achieves $\epsilon$-Nash Equilibria with high probability efficiently.

One direct result of \thmref{thm: eNE} is that if any point belongs to $\discreteSet$ that bears a suboptimal gap on the reward except for the global optimum. Then, after sufficient query, the algorithm will identify $\instance^*$ as the only point in the ROI. In that case, \algname will only query $\instance^*$ and achieve zero regret afterward.

\begin{cor}\label{cor: zero-regret}
We assume the aforementioned conditions in \thmref{thm: eNE} hold, and $\forall \instance \in \discreteSet$, $\instance \neq \instance^*$, it holds that $\globalf(\instance) > \epsilon$. Then we have $$\Pr{\globalf(\instance^T)= 0 } \geq 1 - \delta$$
\end{cor}

Similarly, if starting from $t'$ before $T$, the ROI only consists of a group of suboptimal candidates that is sufficiently close to $\instance^*$ and meets the condition assumed in \thmref{thm: eNE}, then the algorithm achieves a sublinear cumulative regret after identifying this near-optimal region, and is therefore no-regret after $t'$.
\begin{cor}\label{cor: cum-regret}
We assume the aforementioned conditions in \thmref{thm: eNE} hold, and $\exists t' < T$ such that $t'\geq \frac{\beta \maxInfoARISE_t' \hat{C}_1}{\epsilon^2}$. Then we have $$\Pr{\sum_{t=t'}^{T}\globalf(\instance^t) \leq {\sqrt{T\beta\maxInfo_{T}\hat{C}_1}}} \geq 1-\delta$$
\end{cor}
\begin{remark}
    The result above shows that
    \algoref{alg:main} achieves no regret after identifying the near-optimal region and the cumulative regret is sublinear. Though the analysis assumes a discretization that consists of the Nash Equilibria, the result is also applicable to the continuous version of the problem, as long as the discretization is sufficiently dense and there is an additional smoothness guarantee on the utilities. Then, the density combined with the assumed smoothness could be translated into an approximation error due to the discretization, and the result is still applicable.
\end{remark}
