
\section{Algorithms}
\subsection{Approximation of the Partial Maximum}
Before discussing the proposed algorithm, we describe the method used in \citep{al2018approximating}. Recall that computing the loss $f(\x)$ requires the values of $\utilF(\x)$ and $\max_{x'_i} \utilF(x'_i,\x_{-i})$ (i.e. $\max_{x'_i} \utilF(x'_i,\x_{-i}) + \utilF(\x)$) for every agent $i \in \uSpace$. First of all, they proposed to approximate $\utilF(\x)$ with the mean of the GP posterior, i.e. $\mu_{\utilF, t}(\x)$, as denoted in \eqref{eq:gp_posterior}. 

The more intriguing part is to approximate the \textit{partial maximum,} i.e.,  $\partU(\minusIX) \triangleq \max_{\partX} \utilF(\partX, \minusIX)$.
As a result, its maximum can be recovered by its mean and standard deviation, i.e., \[\max_{x'_i} \utilF(x'_i,\x_{-i}) = \mu_{\partU}(x_i) + \tau\sigma_{\partU}(x_i),\] where $\mu_{\partU}(x_i)$, $\sigma_{\partU}(x_i)$ denote the mean and standard deviation of  $\partU(\x_{-i})$, $\tau$ is a hyper-parameter of the algorithm. Formally, given the observation history $\ddd^{1:t}$, they can be computed as follows. 
\begin{small}
\begin{equation}\label{eq:conditional_dist}
    \begin{split}
        \mu_{v_i, t}(x_i) &= \ee_{x'_i}\big[\mu_{v_i, t}(x'_i)\big] \\
        \sigma^2_{v_i, t}(x_i) &= \ee_{x'_i}\big[ \big(\mu_{v_i, t}(x'_i)- \mu_{v_i, t}(x_i)\big)^2\big]
    \end{split}
\end{equation}
\end{small}
The function value can therefore be approximated as \begin{equation}\label{eq:aq}\hat{f}(\x|\ddd^{1:t}) \approx \max_i \mu_{v_i, t}(x_i)+ \tau \sigma_{v_i, t}(x_i) - \mu_{\utilF,t}(\x).\end{equation}
\looseness -1 \cite{al2018approximating} used \eqref{eq:aq} as the acquisition function and searching the query point $\x^{t+1} = \arg\min_{\x} \hat{f}(\x|\ddd^{1:t})$ for the next round $t+1$. However, the acquisition function in the BO should balance between exploration and exploitation in general, while maximizing \eqref{eq:aq} is pure exploitation, i.e., sampling from potentially optimal areas in $\xxx$ according to the posterior of the GP model. 

\subsection{Adaptive Level-set Estimation for Global Optimization}
We take inspiration from recent advancements in high-dimensional Bayesian optimization (HDBO) by \citep{zhang2023learning} and integrate the idea of \citep{al2018approximating} into its framework to achieve efficient optimization of the objective defined in \eqref{eq:loss} with a rigorous theoretical guarantee on the convergence rate. 
First, We approximate the unknown  $\partU(\minusIX) \triangleq \max_{\partX} \utilF(\partX, \minusIX)$ with its corresponding upper confidence bound (\UCB) and lower confidence bound (\LCB) derived from the marginalized $\GP_{\partU} \triangleq \GP_{\utilF|\minusIX}$ and
\begin{small}
\begin{align}\label{eq:partialUCB}
\begin{split}
    &\UCB_{\partU, t}(\minusIX, \Space) \triangleq \\
    &\max_{\partX: (\partX, \minusIX) \in \Space}{\mu_{\utilF,t-1}(\partX, \minusIX) + \beta^{1/2}\sigma_{\utilF,t-1}(\partX, \minusIX)},
\end{split}
\end{align}
\begin{align}\label{eq:partialLCB}
\begin{split}
    &\LCB_{\partU, t}(\minusIX, \Space) \triangleq \\
    &\max_{\partX: (\partX, \minusIX) \in \Space} {\mu_{\utilF,t-1}(\partX, \minusIX) - \beta^{1/2}\sigma_{\utilF,t-1}(\partX, \minusIX)},
\end{split}
\end{align}
\end{small}

\noindent where $\beta$ controls the confidence level and will be discussed in the later analysis. $\Space$ denotes the domain where the marginal maximum is taken. We will show that \eqref{eq:partialUCB} and \eqref{eq:partialLCB} provide a high confidence bound of $\partU$ with its width bounded after a certain amount of iterations.

Second, we modify the superlevel-set estimation and filtering in \citet{zhang2023learning} to achieve efficient search space filtering for optimization. 

The original HDBO algorithm proposed by \citep{zhang2023learning}, leverages the confidence interval of the global Gaussian process $\mathcal{GP}$ to define the upper confidence bound 
$ \UCBit_{t}(\instance) \triangleq \mu_{t-1}(\instance) + \beta^{1/2}\sigma_{t-1}(\instance)$ 
and lower confidence bound $ \LCB_{t}(\instance) \triangleq\mu_{t-1}(\instance) - \beta^{1/2}\sigma_{t-1}(\instance)$, where $\sigma_{t-1}(\instance) = k_{t-1}(\instance,\instance)^{1/2}$ and $\beta$ acts as an scaling factor. 
Then the maximum of the global lower confidence bound $ \LCB_{t, \max} \triangleq \max_{\instance\in\searchSpace}  \LCB_{t}(\instance)$ is used as the threshold for filtering the candidates with low \UCB. 
Therefore, it defines the superlevel-set on the search space $\searchSpace$ that w.h.p. contains the global optimum.

Here we use the confidence interval of the global Gaussian process $\mathcal{GP}_{\utilF}$ 
and the marginalized \UCB defined in \eqref{eq:partialUCB} 
to define the upper confidence bound of the objective defined in \eqref{eq:loss} similarly.

For each utility function $\utilF$, at a certain time $t$ we have the corresponding upper and lower confidence bound: 
\begin{align*}
\UCBit_{\utilF, t}(\instance) &\triangleq \mu_{\utilF,t-1}(\instance) + \beta^{1/2}\sigma_{\utilF,t-1}(\instance) \\
\LCB_{\utilF, t}(\instance) &\triangleq\mu_{\utilF,t-1}(\instance) - \beta^{1/2}\sigma_{\utilF, t-1}(\instance).
\end{align*}
Then we have the \UCB and \LCB for $\globalf$:
\begin{small}
\begin{align}
    \UCBit_{\globalf, t}(\instance, \Space) \triangleq \sum_{i \in \uSpace} \UCBit_{\partU, t}(\minusIX, \Space) - \LCB_{\utilF, t}(\instance)
\end{align}
\end{small}
\begin{small}
\begin{align}
    \LCB_{\globalf, t}(\instance, \Space) \triangleq \sum_{i \in \uSpace} \LCB_{\partU, t}(\minusIX, \Space) - \UCBit_{\utilF, t}(\instance)
\end{align}
\end{small}
\reviseFx{Since $\globalf(\instance^*)=0$ means Nash Equilibrium is achieved at $\instance^*$, the minimum of $\LCB_{\globalf, t}$ over a search space containing the global optimum should be smaller than $\globalf(\instance^*)=0$ with high probability. And as $t$ approaches $\infty$, $\LCB_{\globalf, t} \rightarrow 0$. Such property will be reflected in \thmref{thm: simReg} discussed below.}
For briefness, we ignore the $\Space$ on the inputs when we feed $\searchSpace$. Namely we denote $\UCB_{\globalf, t}(\instance, \searchSpace)$ with $\UCB_{\globalf, t}(\instance)$, and denote $\LCB_{\globalf, t}(\instance, \searchSpace)$ with $\LCB_{\globalf, t}(\instance)$. Since we are minimizing the loss function $\globalf$, we define the filtering threshold as $\UCBit_{\globalf, t, \min} \triangleq \min_{\instance\in\searchSpace}  \UCBit_{\globalf, t}(\instance)$.
Then, the following sublevel-set
\begin{align}
    \roi^t \triangleq \curlybracket{ \instance \in \searchSpace \mid   \LCB_{\globalf, t}(\instance) \le \min(\UCBit_{\globalf, t, \min}, 0)} \label{eq:roi}
\end{align} 
serves as the region(s) of interest (ROI)\footnote{In practice, since with high probability $\UCBit_{\globalf, t, \min}\geq f^*$, and by assumption the search space consists the NE ($f^*=0$), it holds that with high probability the ROI threshold is zero.}.

\subsection{Efficient High-dimensional Optimization through ROI Reduction}

Through the optimization, reducing the ROI $\roi^t$ alleviates the difficulty of learning on the high-dimensional search space. See Figure \ref{fig:roi} for an illustration where 10 initialization points have reduced our search space for learning the NE of Example \ref{example:saddle}. Combined with the following acquisition function,  the proposed algorithm $\algo$ achieves an adaptive trade-off between exploration and exploitation.
\begin{equation}\label{eq:acqF}
    \acqF(\instance, \Space) = \UCB_{\globalf, t}(\instance, \Space) - \LCB_{\globalf, t}(\instance, \Space)
\end{equation}
This acquisition differentiates from the well-known variance reduction acquisition function in active learning domain \citep{mackay1992information} in twofolds. 
First, the acquisition function is defined on both confidence intervals of each utility function ${\utilF}$, and the confidence interval tailored to the marginal maximum on ${\partU}$ as defined in \eqnref{eq:partialUCB} and \eqnref{eq:partialLCB}, which are differentiated from the naive definition of the confidence interval on a global Gaussian process.
Second, as is shown in the following, we only optimize the acquisition function in a subset of the search space $\roi^t$ instead of the whole search space $\searchSpace$. The reduction of $\roi^t$ guarantees the efficiency of the optimization by avoiding unnecessary queries in the low utility region.

\begin{algorithm*}%[H]
    \caption{\textbf{\underline{A}}daptive \textbf{\underline{R}}egion of \textbf{\underline{I}}nterest \textbf{\underline{S}}earch for Nash \textbf{\underline{E}}quilibrium (\algname)}
    \label{alg:main}
    
        \begin{algorithmic}[1]
            \STATE {\bf Input}:Search space $\searchSpace$, initial observation $\Selected^0$, horizon $T$;
            \FOR{$t = 1\ to\ T$}
                \STATE Fit the Gaussian processes $\mathcal{GP}_{\utilF,t}$: $\algParam_{\utilF,t} \leftarrow \argmin_{\algParam_{\utilF}}-\log \Pr{y_i^{1:t-1}\mid \instance^{1:t-1},\algParam_{\utilF}}$  
                \STATE
                Identify ROIs via sublevel-set estimation
                $\roi^t \leftarrow \{\instance\in \searchSpace \mid \LCB_{\globalf, t}(\instance) \leq 0 \}$ \label{alg:ln:filtering}
              
                \STATE Optimize the {sublevel-set} acquisition function: $\instance^{t} \leftarrow \argmax\limits_{\instance\in \roi^t}{\acqF(\instance, \roi^t)}$ as in \eqref{eq:acqF}
                
                \STATE $\Selected^{1:t} \leftarrow \Selected^{1:t-1} \cup \{(\instance^{t}, \y^{t})\}$
            \ENDFOR
            \STATE {\bf Output}: $\argmin\limits_{\instance\in \roi^T}\LCB_{\globalf, T}(\instance)$
        \end{algorithmic}
    \end{algorithm*}

The ROI identification could be computationally expensive, especially in high-dimensional search space, as it requires point-wise comparison. Thus, its efficiency is highly dependent on the size and distribution of the discretization of the search space. The ROI identification and reduction along the optimization could help mitigate the efficiency problem. In the following section, we offer a theoretical analysis in \lemref{lem: mono_ci} showing that the ROI identification in line 4 of \algoref{alg:main} could be equivalent to
\begin{equation}\label{eq:roi_new}
\roi^t = \{\instance\in \roi^{t-1} \mid \LCB_{\globalf, t}(\instance) \leq 0 \}
\end{equation}
when setting $\roi^0 = \searchSpace$. This means that the ROI identification is actually a hierarchical filtering of the search space and is accelerated by its continuing shrinkage. 
There is no guarantee of the ROI shrinkage rate, potentially making its performance unstable in High-Dimensional BO (HDBO) tasks. There are several potential solutions. There are chances to incorporate existing orthogonal HDBO techniques, including sparse GP \citep{mcintire2016sparse, moss2023inducing} and dimension reduction for BO \citep{song2022monte, wang2016bayesian, Letham2020Re, HeSBO19, papenmeier2022increasing}. However, the methods require additional structural assumptions that do not necessarily hold in NE discovery and, therefore, require cautiousness depending on the application. 

\begin{remark}
    The proposed algorithm \algname targets games with discretized strategy spaces for identifying the ROI, similar to previous works by 
    \citet{picheny2019bayesian}. To tackle continuous search space where no smoothness guarantee is known to discretize the space to allow efficient ROI identification. We propose an optional method in the \appref{sec: copt} to accelerate the candidate pick in the high-dimensional space by formulating the ROI identification and the acquisition function optimization in lines 4 and 5 of \algoref{alg:main} together as a conventional constrained optimization problem and solve it efficiently with an over-the-shelf tool.
\end{remark}