\section{Introduction}
\vspace{5mm}
The Nash equilibrium (NE) is a fundamental concept in game theory and represents a stable point in strategic interactions among multi-agent systems. The computation of NE has been extensively explored. Existing computational studies \citep{bacsar1987relaxation,li1987distributed,uryas1994relaxation} have provided valuable insights into equilibrium existence, complexity, and algorithmic solutions when agents' utility information is public knowledge. However, when dealing with a game, particularly one involving multiple agents, it is unrealistic to expect that anyone possesses an explicit representation of its utility function, even if the game itself has a succinct representation. In many real-world scenarios, a reasonable modeling assumption is that given the strategy profile of all agents, we can query their corresponding utilities. 

Our focus lies in developing algorithms that discover NE through a series of queries, where each query proposes a strategy profile and receives information about the corresponding utilities of all agents. Such games are also referred to as black-box or simulation-based games \citep{wellman2006methods,jordan2008searching,vorobeychik2010probabilistic,fearnley2015learning}. For instance, we can envision an agent-based combat simulation where the analyst has the ability to configure the strategic parameters of the adversaries and execute the simulation to obtain a representative outcome of a battle or campaign \citep{vorobeychik2009game}. Other examples include simulation-based game theoretic analyses of supply chains \citep{vorobeychik2006empirical} and simultaneous ascending auctions \citep{wellman2008bidding}. The motivation of this model is from a common practice today of ``\textit{centralized training, decentralized execution}'' in multi-agent learning (originated from the highly impactful work of \cite{NIPS2017_68a97503}). That is, in many robotics and game-playing applications (e.g., OpenAI Gym), the learning environments are well-defined such that the game parameters can be learned in a centralized fashion by controlling agents' action profiles. Thus, the agents can learn to play the NE strategy from \mh{the perspective of a centralized game analyst,} 
and then deploy the learned strategies in the decentralized environment to play against unknown opponents. 

In order to learn the NE of the aforementioned black-box games through queries, it is crucial to estimate the distance of each query from the NE.  Essentially, we can estimate whether each agent has an inclination to deviate from the queried strategy. As a result, each query involves computing the optimal deviation of all agents from the specified strategy. This process is inherently computationally expensive, as it requires optimization of an unknown utility function for each agent. To summarize, we make the following assumption about the agents' utility function in the black-box games mentioned above. 

\begin{assumption} 
We assume the utility functions may have some regularity properties but are possibly strongly non-convex. Queries on the utility functions result from an expensive process and can be corrupted by noise. 
\end{assumption}

In light of the above assumption and the intrinsic cost of querying utility functions, we employ Gaussian Process (GP) \citep{garnett2023bayesian} as an effective tool for
tackling such black-box optimization problems. 
This paper investigates the application of GP in the context of learning the Nash equilibrium.

\paragraph{Our Results and Implications.}  Given the lack of agents' utility information and the expensive query mentioned above, this paper studies efficient no-regret learning of the NE for black-box games via GP. To the best of our knowledge, there were no existing GP algorithms for learning NE with a known no-regret guarantee. 
The key innovation in our work is the design of a novel GP objective specifically for NE learning. Specifically, we characterize the equilibrium computation as an optimization problem involving an unknown loss function. This function represents the maximum utility gain that agents can achieve by deviating from the given strategy. Notably, reaching a zero value of this function corresponds to the NE, a scenario where no agent can improve their utility by changing their strategy given the strategies of others.

A critical aspect of our approach is that each query to the loss function involves calculating all agents' optimal deviation from the given strategy. This process is inherently computationally expensive, as it requires optimization of an unknown utility function for each agent. 
Our main result provides a no-regret learning algorithm that provides a theoretical guarantee of convergence to the 
Nash Equilibrium.
We demonstrate the algorithm's effectiveness and compare its performance in terms of regret against recent algorithms in the literature on a collection of classical structured games as well as the real-world marketing budget allocation game. 