% \section{Computing SW-SPNE and SW-SPCE}
\section{Generalized BI}

% \gabriel{maybe just cite \citep{SL-B09}?}

We now consider how to \emph{compute} equilibria for NS-CSGs.
For a fixed initial state, finite-horizon NS-CSGs are finite games, obtained by unfolding the game tree while invoking the NN perception function.   
In principle, this allows us to employ established game-theoretic solution such as backward induction.
%In addition to admitting classical game-theoretic solutions, this has the advantage that the NN perception mechanism can be handled outside the optimisation problem
%(in contrast to \citep{MEA-EB-PK-AL:20}, where perception functions are assumed to be piecewise linear and encoded as constraints within an MILP problem). %\eqref{eq:computation-NE} and \eqref{eq:reward-constraints}.{\color{blue} This is advantageous to the extent that it allows for classic game-theoretic solutions to be applied and also makes it possible for the treatment of the perception mechanisms to be done outside the optimisation problem. It is worth pointing out that, as done in \cite{MEA-EB-PK-AL:20}, it would possible to encode these elements as functions, which would then appear when expressing transitions and rewards in \eqref{eq:computation-NE} and \eqref{eq:reward-constraints}. For the problem we consider here, however, this would add to the complexity of the problem with no clear gain. We plan to investigate the impact and benefits of this alternative in future work, particularly when considering infinite-horizon properties. Among the techniques that are commonly applied to finite-horizon games,} %
%Since we focus on finite-horizon equilibria, we can avoid this increase in complexity at the cost of fixing an initial state. 
We next prove that
the classical \emph{generalized backward induction} (GBI) \citep{SL-B09} can be used to find a finite-horizon SPNE or SPCE through local optimisation, but that this equilibrium might have 
% \marta{arbirarily bad} \rui{yes} 
an arbitrarily bad social welfare. 

% Thus, we will discuss how to compute an SPNE or SPCE that is socially optimal, that is, an SW-SPNE or SW-SPCE.


%\gabriel{the notation using $\mu$ has not been introduced and can be confusing. should maybe come after the example.}
%\martatodo{Agreed, this needs rearranging}
%\rui{I used $\mu$ to denote a strategy profile for normal-form games which now have been moved to the beginning of Section 2.}
%\gabriel{here it's not quite a normal-form game though. and the notation with the indexes, e.g. $\mu^{4(1)}$, is not immediately clear.}


% Before proceeding with the compuation of SW-SPE, we first discuss the subgame perfection of SW-SPE in terms of social welfare, and reveal that \rev{an intuitive algorithm proposed in the current works} possibly find an SPE with an arbitrarily bad social welfare with respect to the optimum.



% \martatodo{Strictly speaking, our definition of the game must include environment, so we need to find a way to justify the example}

% \ruitodo{Yes, the environment is necesary in our model. Example 1 assumes an environment with dummy actions. I should mention this in the example.}




% \begin{figure}[tbp]
%     \centering
%     \begin{tikzpicture}[
%     roundnode/.style={circle, radius = 0.5mm, draw=white, inner sep=0pt, minimum size=12pt},
%     squarednode/.style={rectangle, draw=white,  inner sep=0pt, minimum size=12pt},
%     ]
%     % nodes
%     \node[roundnode,fill=red!60]  (node1) at (0,0) {1};
%     \node[roundnode,fill=red!45]  (node2) at (-3,-1) {2};
%     \node[roundnode,fill=red!45]  (node3) at (-1,-1) {3};
%     \node[roundnode,fill=red!45]  (node4) at (1,-1) {4};
%     \node[roundnode,fill=red!45]  (node5) at (3,-1) {5};
%     \node[squarednode,fill=red!30] (node6) at (-2,-2.5) {6};
%     \node[squarednode,fill=red!30] (node7) at (0,-2.5) {7};
%     \node[squarednode,fill=red!30] (node8) at (2,-2.5) {8};
%     \node[squarednode,fill=red!30] (node9) at (4,-2.5) {9};
%     % edges
%     \path[draw,-, above] (node1) to node [pos=0.85] {\scriptsize (U,L)}(node2);
%     \path[draw,-, right] (node1) to node [left, pos=0.6] {\scriptsize (U,R)}(node3);
%     \path[draw,-, left] (node1) to node [right, pos=0.6] {\scriptsize (D,L)}(node4);
%     \path[draw,-, above] (node1) to node [pos=0.85] {\scriptsize (D,R)}(node5);
%     \path[draw,-, above] (node4) to node [pos=0.9] {\scriptsize (U,L)}(node6);
%     \path[draw,-, right] (node4) to node [left, pos=0.775] {\scriptsize (U,R)}(node7);
%     \path[draw,-, left] (node4) to node  [right, pos=0.775] {\scriptsize (D,L)}(node8);
%     \path[draw,-, above] (node4) to node [pos=0.9] {\scriptsize (D,R)}(node9);
%     % payoffs
%     \node[below=2mm] at (node6) {\footnotesize $(0,8)$};
%     \node[below=2mm] at (node7) {\footnotesize $(0,0)$};
%     \node[below=2mm] at (node8) {\footnotesize $(0,0)$};
%     \node[below=2mm] at (node9) {\footnotesize $(5,2)$};
%     \node[below=2mm] at (node2) {\footnotesize $(1,1+\phi)$};
%     \node[below=2mm] at (node3) {\footnotesize $(3,\phi)$};
%     \node[below=2mm] at (node5) {\footnotesize $(0,0)$};
%     \end{tikzpicture}
%     \caption{A two-stage game tree with two agents with $\phi<0$.}
%     \label{fig:example1}
% \end{figure}

% \startpara{Generalized BI via SWE}

\algoref{alg-BI-SWNE} shows a version of the classical GBI method,
for concurrent extensive-form games over a finite horizon,
which aims to find an SPNE or SPCE that maximises social welfare,
by computing an NE or CE which is \emph{locally} social welfare maximal at each history.
In \algoref{alg-BI-SWNE}, $\mathsf{HISTORY}(\csg,s,\ell)$ computes a set of all histories in stage $\ell$ given an initial state $s\in S$. $\mathsf{SUCCESSOR}(\csg,H_s^{\ell+1},h)$ extracts a set of all successors of a history $h$ in stage $\ell$ from $H_s^{\ell+1}$. $\mathsf{SWE\_SOLVER}\big(\csg,r,\equilibrium,,h,\{V^{h'}\,|\,h'\in \textup{Succ}(h)\}\big)$ computes an SWNE or SWCE $\mu^h$ (depending on the equilibrium type $\equilibrium \in \{ \textup{CE}, \textup{NE} \}$) of an induced normal-form game with actions available at $last(h)$ and utilities from the equilibrium payoffs $V^{h'}$ of all successors $h'$ of $h$, and then assigns the equilibrium payoff associated with this equilibrium to $V^h$. This procedure is iterated from the bottom up until $\ell=0$, i.e., $h=s$, where the equilibrium payoffs of histories at stage $K$ (i.e., where the game ends) are equal to final states' rewards. 
% Since an SWNE or SWCE is computed at each history from the bottom up, 
For this algorithm, we have the following proposition.
% \gabrieltodo{Needs to be rewritten for both NE and CE.}

% \ruitodo{Fixed. Please check}

% We next consider the generalized backward induction (BI)  to compute an SPE and the related social welfare, as follows:
% \begin{itemize}
%     \item Given an initial state, compute the finite set of all possible paths and also the finite set of associated states;
    
%     \item Assign the zero vector to the final state of each path;

%     \item Pick a state whose successor states all have been assigned with values;
    
%     \item Compute a social welfare NE (SWNE) of an induced normal-form game with actions available at this state and utilities from the values of its successor states;
    
%     \item Assign the payoff vector associated with this equilibrium to this state, and eliminate the successor states;
    
%     \item Iterate this procedure until a move is assigned at every contingency, when there remains no successor state to eliminate.
% \end{itemize}

\begin{algorithm}[t]
	\caption{Generalized b/w induction (GBI) via SWE\label{alg-BI-SWNE}}
	\textbf{Input:} NS-CSG $\csg$, rewards $r$, equ. type $\equilibrium$, initial state $s$
	
	\textbf{Output:} an equilibrium $\mu$, equilibrium payoff vector $V$
	\begin{algorithmic}[1]
	    \State $H_s^{\ell}\gets \mathsf{HISTORY}(\csg,s,\ell)$ for all $\ell\leq K$
	    
	    \State \textbf{for }$\ell=K,K-1,\dots,0;h\in H_s^{\ell}$ \textbf{do}
	    
	    \State \quad \textbf{if} $\ell=K$ \textbf{then} 
	    
	    \State \qquad $V^{h}\gets (r_1^S(last(h)),\dots, r_n^S(last(h)))$
	    
	    \State \quad \textbf{else}
	    
	    \State \qquad $\textup{Succ}(h)\gets \mathsf{SUCCESSOR}(\csg,H_s^{\ell+1},h)$ 
	    
	    \State \qquad  $(\mu^{h},V^{h})\gets\mathsf{SWE\_SOLVER}\big(\csg,r,\equilibrium,,h,$
	    \Statex \hspace{11.5em}$\{V^{h'}\,|\,h'\in \textup{Succ}(h)\}\big)$ 
	    
	   % \State $\sigma(h)\gets \mu^h$ for all $h\in H_s^{<K}$
	    
	   % \State $W_{0,s}^{\sigma}\gets \sum_{i\in N}V^s_i$
	   
	   \State $\mu \gets \{ \mu^h \}_{h \in H_s^{<K}}$, $V \gets \{ V^h \}_{h \in H_s}$
	    
	    \State \textbf{return} $\mu, V$
	\end{algorithmic}
\end{algorithm}

% We first consider fully observable stochastic games in which every agent has access to the path $\pi$ at each step. This scenario makes sense when we as a coordinator want to find an SW-SPE and then dispatch the corresponding distribution over actions to each agent at each step. The property of SPE can guarantee that no agent has the motivation to deviate from the suggested distributions due to no benefits, while the property of SW-SPE further guarantees the optimal social welfare. 

% Thus, in the case, the strategy of each agent is as follows.

% \begin{defi}[Fully-observable strategy]
% A fully-observable strategy for agent $i\in N\cup \{E\}$ in an NS-CSG $\csg$ is a function of the form $\sigma_i:FPaths_{\csg}\to Dist(A_i\cup\{\perp\})$ such that, if $\sigma_i(\pi)(a_i)>0$, then $a_i\in A_i(last(\pi))$. We denote by $\Sigma_{\csg}^i$ the set of all strategies for agent $i$.
% \end{defi}

% For any given initial state, the finite-horizon NS-CSGs are finite games. We adopt generalized backward induction (BI) to compute an SPE and the related social welfare, as follows:
% \begin{itemize}
%     \item Given an initial state, compute the finite set of all possible paths and also the finite set of associated states;
    
%     \item Assign the zero vector to the final state of each path;

%     \item Pick a state whose successor states all have been assigned with values;
    
%     \item Compute a social welfare NE (SWNE) of an induced normal-form game with actions available at this state and utilities from the values of its successor states;
    
%     \item Assign the payoff vector associated with this equilibrium to this state, and eliminate the successor states;
    
%     \item Iterate this procedure until a move is assigned at every contingency, when there remains no successor state to eliminate.
% \end{itemize}


% \begin{algorithm}[t]
% 	\caption{Generalized BI via Recursive SWNE\label{alg-greedy-SWNE}}
% 	\textbf{Input:} NS-SCG $\csg$, objective profile $Y$, initial state $s\in S$
	
% 	\textbf{Output:} an SPE $\sigma$, social welfare $W_{1,s}^{\sigma}$
% 	\begin{algorithmic}[1]
% 	    \State $H_s^{\ell}\gets \mathsf{HISTORY}(\csg,s,\ell)$ for $\ell\in[K]$
	    
% 	    \State \textbf{for }$\ell\in[K],h\in H_s^{\ell}$ \textbf{do}
	    
% 	    \State \quad \textbf{if} $\ell=K$ \textbf{then} 
	    
% 	    \State \qquad $V^{h}\gets 0_{1\times n}$
	    
% 	    \State \quad \textbf{else}
	    
% 	    \State \qquad $\hat{H}_h\gets \mathsf{SUCCESSOR}(\csg,H_s^{\ell+1},h)$ 
	    
% 	    \State \qquad  $(\mu^{h},V^{h})\gets\mathsf{SWNE\_SOLVER}\big(\csg,r,h,$
% 	    \Statex \hspace{11.5em}$\{V^{h'}\,|\,h'\in \hat{H}_{h}\}\big)$ 
	    
% 	    \State $\sigma\gets\{\mu^{h}\,|\,h\in H_s^{\ell},\ell\in[K-1]\}$, $W_{1,s}^{\sigma}\gets \sum_{i\in N}V^s(i)$
	    
% 	    \State \textbf{return} $\sigma,W_{1,s}^{\sigma}$
% 	\end{algorithmic}
% \end{algorithm}

% \begin{algorithm}[t]
% 	\caption{Generalized BI via greedy SWNE\label{alg-greedy-SWNE}}
% 	\textbf{Input:} NS-SCG $\csg$, objective profile $Y$, initial state $s\in S$
	
% 	\textbf{Output:} an SPE $\sigma$, social welfare $W_{\csg,s}^{\sigma}$
% 	\begin{algorithmic}[1]
% 	    \State $FPath_{\csg,s}\gets \mathsf{PATH}(\csg,s)$
	    
% 	    \State $S_k\gets \mathsf{STATE}_k(FPath_{\csg,s})$ for all $0\leq k\leq K$
	    
% 	    \State \textbf{for }$0\leq k \leq K,s'\in S_k$ \textbf{do}
	    
% 	    \State \quad \textbf{if} $k=0$ \textbf{then} 
	    
% 	    \State \qquad $V_{\csg}(s',k)\gets 0_{1\times n}$
	    
% 	    \State \quad \textbf{else}
	    
% 	    \State \qquad $F_{s'}^{\alpha}\gets \mathsf{SUCCESSOR}(FPath_{\csg,s},s',\alpha)$ 
	    
% 	    \State \qquad  $(\mu(s',k),V_{\csg}(s',k))\gets\mathsf{SWNE\_SOLVER}\big(\csg,r,s',$
% 	    \Statex \hspace{11.5em}$\{V_{\csg}(s'',k-1)\,|\,s''\in F_{s'}^{\alpha}\}\big)$ 
	    
% 	    \State $\sigma\gets\{\mu(s',k)\,|\,1\leq k\leq K,s'\in S_k\}$
% 	    \Statex $W_{\csg,s}^{\sigma}\gets \sum_{i\in N}V_{\csg}(s,K)(i)$
	    
% 	    \State \textbf{return} $\sigma,W_{\csg,s}^{\sigma}$
% 	\end{algorithmic}
% \end{algorithm}



% We denote by $[m]$ the set $\{1,\dots,m\}$. We summarize the above method in \algoref{alg-BI-SWNE}, called Generalized BI via greedy SWNE. $\mathsf{Path}(\csg,s)$ computes the paths of length $K+1$ when the game $\csg$ starts in state $s$, and 
% $\mathsf{PATH}_k(FPath_{\csg,s})$ returns a set of paths of length $k$. Also note that $P_{K,s}=\{s\}$. $\mathsf{SUCCESSOR}(FPath_{\csg,s},\pi,\alpha)$ determines the successor paths of path $\pi$ under the joint action $\alpha$. $\mathsf{SWNE\_SOLVER}(\cdot)$ computes an SWNE. We denote by $P_{[K],s}=\cup_{k\in [K]}P_{k,s}$ the set of paths of length not more than $K$.

% I did not find any works analyzing the above algorithm in which different from the classical BI, an SWNE is computed at each step. I think we need the supporting results or provide some conclusions here.




% \gabriel{worth looking at \cite{LRTZ06}.}

\begin{pro}\label{pro:Generalized-SWNE}Given an initial state $s\in S$, GBI finds an SPNE $\sigma$ (SPCE $\tau$, resp.) with social welfare $W_{0,s}^{\sigma} = \sum_{i \in N} V_i^s$ ($W_{0,s}^{\tau} = \sum_{i \in N} V_i^s$, resp.). 
\end{pro}

Although GBI can find an SPNE or SPCE, unfortunately it may return one with an arbitrarily bad social welfare with respect to the optimum.


\begin{lema}[Bad social welfare]\label{lema:bad-sw}
The SPNE (SPCE, resp.) obtained by GBI SWE can be arbitrarily bad on social welfare with respect to an SW-SPNE $\sigma^*$ (SW-SPCE $\tau^*$, resp.) for some state $s\in  S$, i.e., $W_{0,s}^{\sigma^*}-W_{0,s}^{\sigma}$ ($W_{0,s}^{\tau^*}-W_{0,s}^{\tau}$, resp.) is positive and unbounded. 
\end{lema}
% \begin{proof}
% We postpone the proof to \appxref{sec:appendix-a}.
% We consider \examref{example-1} again. Since $\mu^{4(1)}$ has the maximum social welfare, then the Generalized BI via Recursive SWNE feeds $V^{4(1)}$ to node $1$, thus leading to node $1$'s social welfare $W_{0,s}^{\sigma}=2+\phi$. However, the node $1$'s social welfare $W_{0,s}^{\sigma^*}$ under the SW-SPE $\sigma^*$ is $7$. Thus, if $\phi$ is negative enough, the difference $W_{0,s}^{\sigma^*}-W_{0,s}^{\sigma}=5-\phi$ is positive and unbounded.  
% \end{proof}

% \begin{algorithm}[t]
% 	\caption{Exact computation of SW-SPE\label{alg-exact-SW-SPE}}
% 	\textbf{Input:} NS-SCG $\csg$, objective profile $Y$, initial state $s\in S$
	
% 	\textbf{Output:} an SW-SPE $\sigma$, social welfare $W_{\csg,s}^{\sigma}$
% 	\begin{algorithmic}[1]
% 	    \State $FPath_{\csg,s}\gets \mathsf{PATH}(\csg,s)$
	    
% 	    \State $S_k\gets \mathsf{STATE}_k(FPath_{\csg,s})$ for all $0\leq k\leq K$
	    
% 	    \State \textbf{for }$1\leq k \leq K,s'\in S_k$ \textbf{do}
	    
% 	    \State \quad \textbf{for }$\alpha\in \Delta(s')$ \textbf{do}
	    
% 	    \State \qquad $F_{s'}^{\alpha}\gets \mathsf{SUCCESSOR}(FPath_{\csg,s},s',\alpha)$ 
	    
% 	    \State \qquad \textbf{for }$i\in N$ \textbf{do}
	    
% 	    \State \qquad \quad  $\mathsf{Z}_{s'}^{\alpha}(i)\gets r_i^A(s'_i,\alpha)+r_i^S(s'_i)+\sum_{s''\in F_{s'}^{\alpha}}\delta(s',\alpha)(s'')$
% 	\end{algorithmic}
% \end{algorithm}

% Next, we will discuss how to compute an SW-SPE for a given initial state, when two-agent nonzero-sum games are considered. The PWL assumption about observation functions should be utilized. The problem of computing an SW-SPE is addressed by transforming it into an optimization problem with mixed integer linear programming (MILP) constraints and linear complementarity problem (LCP) constraints. We assume that the reward functions $r_i^A$ and $r_i^S$ for $i\in N$, and the local environment transition function $\delta_E$ are all PWL.

% \textbf{MILP formulation of implication clauses.} Each implication clause $f(x)\leq 0\Rightarrow g(x)\leq0$ (where $f(x)$ and $g(x)$ are linear expressions) can be encoded with the following MILP constraints:

% \begin{equation*}
%     f(x)\ge -M\gamma+{\color{red}\frac{M}{2}},\qquad g(x)\leq M(1-\gamma),
% \end{equation*}
% where $\gamma$ is a binary variable and $M$ is a sufficiently large positive constant. Thus, if $f(x)\leq0$, then $\gamma=1$, which gives $g(x)\leq0$ by the second clause.

% \ruitodo{In \cite{MEA-EB-PK-AL:20}, the authors did not encode the implication clause to MILP correctly, because in their encoding the case $f(x)=0$ would lead to a contradiction. So, I add the red term to fix it.}


% \ruitodo{In \cite{MEA-EB-PK-AL:20}, the authors did not encode the implication clause to MILP correctly, because in their encoding the case $f(x)=0$ would lead to a contradiction. So, I add the red term to fix it.}


% We will encode the deterministic transition function $\delta$ to an MILP. We will first focus on the encoding of local transition function for each agent $i\in N$, which leads to the stochasticity. Actually, this encoding method is suitable for any stochastic game with finite states and finite actions. For any local state $s_i\in S_i$, denote by $P_{\delta_i}(s_i)$ the set of action-private-state pairs $(\alpha,prv_i)\in A\times Prv_i$ such that $\delta_i(s_i,\alpha)(prv_i)>0$. 

% which is more complicated than the encoding of deterministic transition function in \cite{MEA-EB-PK-AL:20}. 

% Although the encoding of non-deterministic transition function has been considered in \cite{MEA-EB-PK-AL:20-2}, the authors assume that the transition function is given by many MILP constraints. However, it is not straightforward from the results in \cite{MEA-EB-PK-AL:20-2} to encode the transition functions of stochastic games with finite states and finite actions to MILP, while we need such an encoding here for our problem.

% We will first focus on the encoding of local transition function for each agent $i\in N$, which leads to the stochasticity. Actually, this encoding method is suitable for any stochastic game with finite states and finite actions. For any local state $s_i\in S_i$, denote by $P_{\delta_i}(s_i)$ the set of action-private-state pairs $(\alpha,prv_i)\in A\times Prv_i$ such that $\delta_i(s_i,\alpha)(prv_i)>0$. 


% \begin{figure*}[t]
% \begin{equation}\label{eq-encoding-PLTF}
%      C_i(\bm{x},\bm{a},\bm{y})=\bigcap_{\bm{x}_1\in S_i}\Big(\bigcap_{(\bm{a}_1,\bm{y}_1)\in P_{\delta_i}(\bm{x}_1)}\{\kappa_{\bm{a}_1,\bm{y}_1}=1\}\Rightarrow\big(\{\bm{x}=\bm{x}_1\}\Rightarrow(\{\bm{a}=\bm{a}_1\}\cap\{\bm{y}=\bm{y}_1\})\big)\cap \sum_{(\bm{a}_1,\bm{y}_1)\in P_{\delta_i}(\bm{x}_1)}\kappa_{\bm{a}_1,\bm{y}_1}=1\Big)
% \end{equation}
% \end{figure*} 





% \begin{lema}[Encoding of probabilistic local transition function]
% The encoding $C_i(\bm{x},\bm{a},\bm{y})$ induced by $\delta_i$ is an MILP. Given a local state $s_i\in S_i$, a joint action $\alpha\in A$ and a private state $prv_i'\in Prv_i$, then $\delta_i(s_i,\alpha)(prv_i')>0$ iff there is an assignment $\mathfrak{a}$ to $vars(C_i(\bm{x},\bm{a},\bm{y}))$ such that $s_i=\mathfrak{a}(\bm{x})$, $\alpha=\mathfrak{a}(\bm{a})$, $prv_i'=\mathfrak{a}(\bm{y})$ and $\mathfrak{a}\models C_i(\bm{x},\bm{a},\bm{y})$.
% \end{lema}


% \textbf{Exact computation of SW-SPE.} Next, we will discuss how to compute an SW-SPE given an initial state. The problem of computing an SW-SPE is addressed by transforming it into an MIQCP. 

% \begin{pro}[Computation of SW-SPE]
% For a two-agent NS-SCG $\csg$ with deterministic transitions and an initial state $s\in  S$, if fully-observable strategies are adopted, then a strategy profile $\sigma^*$ is an SW-SPE iff there exists a solution $(s^{\pi*},\mu^{\pi*}_1,\mu^{\pi*}_2,\mu^{\pi*}_E,V^{\pi*}_{\csg})$ $( \pi\in P_{k,s},k\in[K])$ of the optimization problem
% \begin{subequations}
%     \begin{align*}
%       &\textup{maximize}&& 
%       \sum_{i\in N}V_{\csg}^s(i)\\
%       & \textup{variables}&& \mu^\pi_1,\mu^\pi_2,\mu^\pi_E,V^{\pi}_{\csg} \textup{ for } \pi\in P_{k,s},k\in[K]\\
%       & \textup{subject to}&& \eqref{eq:SPE-constraints} \textup{ for } \pi\in P_{k,s},k\in[K],
% \end{align*}
% \end{subequations}
% such that $\sigma_i^*(\pi)=\mu_i^{\pi}$ holds for $i\in N\cup\{E\}$, $\pi\in P_{k,s}$ and $k\in[K]$. The resulting social welfare $W_{\csg,s}^{\sigma^*}$ is equal to the optimal value $\sum_{i\in N}V_{\csg}^{s^*}(i)$.
% \end{pro}




% \subsubsection{Two-Agent NS-CSGs with Probabilistic Transition}

% We now consider the probabilistic transition function $\delta:S\times A\to Dist(S)$, as in \defiref{defi:NS-CSG}. We show that under mild conditions, the problem of computing an SW-SPE can also be addressed by an MIQCP. Different from the deterministic case, the probabilistic transition requires more complex encoding of the reward computation.

% \textbf{Notation.} We assume that for each $i\in N$, the probabilistic local transition function $\delta_i$ determines $b_i\in\mathbb{Z}$ possible next privates given its current local state and the joint action, given by $b_i$ functions $tr_i^1$, $\cdots$, $tr_i^{b_i}$ mapping from $S_i\times A$ to $Prv_i$. This is a common assumption to deal with non-deterministic transitions as in \cite{MEA-EB-PK-AL:20-2}, which can characterize a broad class of probabilistic models. We denote by $tr^{j_1j_2}$ the global transition function $tr$ when $tr_1^{j_1}$ and $tr_2^{j_2}$ are selected.

% \textbf{Exact Computation of SW-SPE.} The transition function is probabilistic, so for each history $h\in H_s$ of length $|h|<K$, different from  \eqref{eq:reward-constraints}, the reward constraint becomes
% \begin{equation}\label{eq:reward-constraints-stoc}
% \begin{aligned}
%     \mathsf{Z}^{h,\alpha}(i)= \ &r_i^A(last(h),\alpha)+r_i^S(last(h))\\
%     &+\sum_{j_1\in[b_1],j_2\in[b_2]}\delta^{h,\alpha}_{j_1j_2}V^{h,\alpha}_{j_1j_2}(i),
% \end{aligned}
% \end{equation}
% where $\delta^{h,\alpha}_{j_1j_2}\ge0$ is the probability of selecting local transition functions $tr_1^{j_1}$ and $tr_2^{j_2}$ for two agents when $\alpha$ is adopted at $h$, satisfying $\sum_{j_1\in[b_1],j_2\in[b_2]}\delta^{h,\alpha}_{j_1j_2}=1$, and $V^{h,\alpha}_{j_1j_2}(i)$ is the corresponding accumulated expected reward of agent $i$ once $tr_1^{j_1}$ and $tr_2^{j_2}$ are chosen. During the game tree construction for the stochastic transition, the value of $\delta_{j_1j_2}^{h,\alpha}$ can be computed by $\delta$. Let $V^{h,\alpha}=\{V^{h,\alpha}_{j_1j_2}\}_{j_1\in[b_1],j_2\in[b_2]}$. We denote by $C_r^P(\{\mathsf{Z}^{h,\alpha},V^{h,\alpha}\}_{\alpha\in A})$ the set of linear constraints in \eqref{eq:reward-constraints-stoc}. 

% Since $\delta^{\pi,\alpha}_{j_1j_2}$ is connected to $s^{\pi}$ by the probabilistic transition function $\delta$, we have the set of MILP constraints $C_{\delta}(s^{\pi},\delta^{h,\alpha})$ in .

% \begin{figure*}[t]
% \begin{equation}\label{eq:c-delta-stoc}
% \begin{aligned}
%     C_{\delta}(s^{\pi},\delta^{\pi,\alpha})=\bigcup_{s_1\in S_1,s_2\in S_2}\{\kappa_{s_1,s_2}=1\}\Rightarrow\big(s^{\pi}_1=s_1,s^{\pi}_2=s_2,\delta^{\pi,\alpha}_{j_1j_2}=\delta((s_1,s_2),\alpha)(j_1,j_2)\big)\bigcup\sum_{s_1\in S_1,s_2\in S_2}\kappa_{s_1,s_2}=1.
% \end{aligned}
% \end{equation}
% \end{figure*} 

% As for the global state transitions, there are $b_1b_2$ possible next states provided that $\alpha$ is adopted at path $\pi$. Similar to the deterministic case, if $tr_1^{j_1}$ and $tr_2^{j_2}$ are chosen for the transition, then the global transition function $tr^{j_1j_2}$ can be encoded by a set of MILP constraints $C_{tr}(s^{\pi},\{s^{\pi,\alpha,j_1,j_2}\}_{\alpha\in A})$, where for a joint action $\alpha\in A$, $s'=tr^{j_1j_2}(s,\alpha)$ iff there is an assignment $\mathfrak{a}$ to $vars(C_{tr}(s^{\pi},\{s^{\pi,\alpha,j_1,j_2}\}_{\alpha\in A}))$ such that $s=\mathfrak{a}(s^{\pi})$, $s'=\mathfrak{a}(s^{\pi,\alpha,j_1,j_2})$ and $\mathfrak{a}\models C_{tr}(s^{\pi},\{s^{\pi,\alpha,j_1,j_2}\}_{\alpha\in A}))$. Let $s^{\pi,\alpha}=\{s^{\pi,\alpha,j_1,j_2}\}_{j_1\in[b_1],j_2\in[b_2]}$. 

% \begin{figure*}[t]
% \begin{equation}\label{eq:c-tr-stoc}
% \begin{aligned}
%     C_{tr}(s^{\pi},\{s^{\pi,\alpha}\}_{\alpha\in A})=\bigcup_{j_1\in[b_1],j_2\in[b_2]}\{\kappa_{j_1j_2}=1\}\Rightarrow C_{tr}^{\pi}(s^{\pi},\{s^{\pi,\alpha,j_1,j_2}\}_{\alpha\in A})\bigcup \sum_{j_1\in[b_1],j_2\in[b_2]}\kappa_{j_1j_2}=1.
% \end{aligned}
% \end{equation}
% \end{figure*}

% Combining \eqref{eq:SPE-constraints} and \eqref{eq:reward-constraints-stoc}, we define a set of MIQCP constraints for each history $h\in H_s$ of length $|h|<K$:
% \begin{equation*}
%     \begin{aligned}
%         &C^{h}(\mu_1^{h},\mu_2^{h},V^{h},\{V^{h,\alpha}\}_{\alpha\in A})\\
%         &=C_{\textup{NE}}(\mu_1^{h},\mu_2^{h},V^{h},\{\mathsf{Z}^{h,\alpha}\}_{\alpha\in A})\cup C_{r}^P(\{\mathsf{Z}^{h,\alpha},V^{h,\alpha}\}_{\alpha\in A}).
%     \end{aligned}
% \end{equation*}
% The union of the above sets of MIQCP constraints for all such histories is denoted by $C_{\textup{P}}(\mu,V)$.

% \begin{thom}[Computation of SW-SPE]\label{thom-SW-SPE-P}For a two-agent NS-SCG $\csg$ with stochastic transition function $\delta$ and an initial state $s\in  S$, if fully-observable strategies are adopted, then 
% \begin{enumerate}
%     \item a strategy profile $\sigma$ is an SPE iff there exists an assignment $\mathfrak{a}$ to $vars(C_{\textup{P}}(\mu,V))$ such that $\sigma_1(h)=\mathfrak{a}(\mu_1^{h})$ and $\sigma_2(h)=\mathfrak{a}(\mu_2^{h})$ for each history $h\in H_s$ with $|h|<K$ and $\mathfrak{a}\models C_{\textup{P}}(\mu,V)$;
    
%     \item a strategy profile $\sigma$ is an SW-SPE iff there exists a solution $\{\mu_1^{*h},\mu_2^{*h},V^{*h}\}_{h\in H_s,|h|<K}$ of the MIQCP
%     \begin{subequations}\label{eq:MIQCP-Stoc}
%     \begin{align}
%       &\textup{maximize}&& 
%       \sum_{i\in N}V^s(i)\\
%       & \textup{variables}&& \mu,V,\\
%       & \textup{subject to}&& C_{\textup{P}}(\mu,V),
%     \end{align}
%     \end{subequations}
%     such that $\sigma_1(h)=\mu_1^{*h}$ and $\sigma_2(h)=\mu_2^{*h}$ for each $h\in H_s$ with $|h|<K$. The resulting social welfare $W_{1,s}^{\sigma}$ is equal to the optimal value $\sum_{i\in N}V^{s}(i)$.
% \end{enumerate}
% \end{thom}

% \section{Approximation algorithms}
\section{Frozen Subgame Improvement} \label{sec:approx_algo}

\lemaref{lema:bad-sw} indicates that a GBI-based approach
does not guarantee optimal social welfare.
Motivated by this, we now consider further techniques to
synthesize SW-SPNE and SW-SPCE for NS-CSGs.
%
We first present an \emph{exact} approach
based on an unfolding of the game tree and the solution of a \emph{nonlinear program}.
However, this does not scale to large games.
So we then propose an iterative \emph{approximation} method called \emph{frozen subgame improvement}.
This works by first finding an arbitrary initial SPNE or SPCE
and then iteratively freezing a set of variables and computing a new SPNE or SPCE with an increasing social welfare.

%for approximation with a decreasing error bound over time.

In this section, we focus initially on the case of \emph{two-agent} NS-CSGs
and then later discuss how to generalise this.

% noting that finding an NE of a two-agent game is no easier than finding an NE of an $n$-agent game \citep{XC-XD-ST:09}. %which would not hurt the generality of the results since finding an NE of a two-agent game is no easier than finding an NE of an $n$-agent game. 
%We provide the hint of extending to $n$-agent games at the end of this section for the interested readers to refer to.
%Regarding the computation, 

% \marta{Suggest move to before Defn 5 and define zero-sum and nonzero-sum before SPE}A two-agent NS-CSG is \emph{zero-sum} if $r_1^A(s,\alpha) + r_1^S(s) + r_2^A(s, \alpha) + r_2^S(s)=0$ for all $s \in S$ and all $\alpha \in A$.} %\marta{explain what happens to n-player}




%\textbf{Notation.} 
% We first introduce some notation. For each history $h\in H_s^{<K}$, we define a tuple of variables $(\mu^{h}_1,\mu^h_2,V^{h})\in \mathbb{P}(A_1(last(h)))\times \mathbb{P}(A_2(last(h)))\times \mathbb{R}^2$, where $\mu_i^h$ is a distribution over the actions $A_i(last(h))$, and $V^h$ denotes the expected accumulated reward vector from $h$ to the end of the game. For each history $h\in H_s^K$, we take the reward vector $V^h=(r_1^S(last(h)),r_2^S(last(h)))$. 

% If the local transition function is deterministic, we define $tr_i:S_i\times A_1\times A_2\times A_E\to Prv_i$ for each agent $i\in N$, inducing a global deterministic transition function $tr:S\times A\to S$, where for $s=(s_1,s_2,s_E)\in S$, $\alpha\in A$, and $s'=(s_1',s_2',s_E')\in S$, we have $s'=tr(s,\alpha)$ iff $s_E'=\delta_E(s_E,\alpha)$, $s_i'=(prv_i',per_i')$, $prv'_i=tr_i(s_i,\alpha)$ and $per_i'=obs_i((prv_i',per_i),s_E')$. If the local transition function is probabilistic, we assume that for each $i\in N$, the probabilistic local transition function $\delta_i$ determines $b_i\in\mathbb{Z}$ possible next privates given its current local state and the joint action, given by $b_i$ functions $tr_i^1$, $\cdots$, $tr_i^{b_i}$ mapping from $S_i\times A$ to $Prv_i$. This is a common assumption to deal with non-deterministic transitions as in \cite{MEA-EB-PK-AL:20-2}, which can characterize a broad class of probabilistic models. We denote by $tr^{j_1j_2}$ the global transition function $tr$ when $tr_1^{j_1}$ and $tr_2^{j_2}$ are selected.

% 

%\ruirev{Given a nonlinear program $\zeta$, we denote by $vars(\zeta)$ the set of variables in $\zeta$. A \emph{variable assignment} $\mathfrak{a}$ for a nonlinear program $\zeta$ is a function $\mathfrak{a}:vars(\zeta)\to\mathbb{R}$ that assigns a specific real value to each variable in $\zeta$. We write $\mathfrak{a}\models \zeta$ to mean that the assignment by $\mathfrak{a}$ satisfies $\zeta$. }
%\rui{Please check whether we need to remove the above startpara, because these notations are only used in the first conclusion of Theorem 10, so that we can save some space.}


% Given an MIQCP (or MILP) $\zeta$, we denote by $vars(\zeta)$ the set of variables in $\zeta$. A \emph{variable assignment} $\mathfrak{a}$ for an MIQCP (or MILP) $\zeta$ is a function $\mathfrak{a}:vars(\zeta)\to\mathbb{R}$ that assigns a specific (binary, integer or real) value to each variable in $\zeta$. We write $\mathfrak{a}\models \zeta$ to mean that the assignment by $\mathfrak{a}$ satisfies $\zeta$. In order to handle with the disjunctive cases, we use the \emph{indicator constraints} of the form $(\kappa=v)\Rightarrow c$, for a binary variable $\kappa$, a binary value $v\in\{0,1\}$ and an MIQCP (or MILP) constraint $c$, meaning that if $\kappa=v$, then $c$ should hold. Furthermore, we use $(\kappa=v)\Rightarrow \zeta$ to denote the indicator constraints for a set of MIQCP (or MILP) constraints $\zeta$, that is, if $\kappa=v$, then $c$ should hold for all $c\in \zeta$. 

% Given an MILP $\pi$, we denote by $vars(\pi)$ the set of variables in $\pi$. A \emph{variable assignment} $\mathfrak{a}$ for an MILP $\pi$ is a function $\mathfrak{a}:vars(\pi)\to\mathbb{R}$ that assigns a specific (binary, integer or real) value to each variable in $\pi$. We write $\mathfrak{a}\models \pi$ to mean that the assignment by $\mathfrak{a}$ satisfies $\pi$. In order to handle with the disjunctive cases, we use the \emph{indicator constraints} of the form $(\kappa=v)\Rightarrow c$, for a binary variable $\kappa$, a binary value $v\in\{0,1\}$ and an MILP constraint $c$, meaning that if $\kappa=v$, then $c$ should hold. Furthermore, we use $(\kappa=v)\Rightarrow \pi$ to denote the indicator constraints for a set of MILP constraints $\pi$, that is, if $\kappa=v$, then $c$ should hold for all $c\in \pi$.

% \martatodo{In what way does this approach differ for neurosymbolic games, or is this still the same as for conventional stochastic games?}

% \ruitodo{Essentially, it is the same for conventional stochastic games.}

\startpara{Exact Computation of SW-SPNE and SW-SPCE} 
Given an initial state $s\in S$, the game unfolds by considering all paths, thus generating a game tree which can be fully characterized by $H_s$. During the game tree construction, $last(h)$ can be computed for any $h\in H_s$, and if $h'$ is a successor of $h$, the joint action(s) that leads to $h'$ from $h$ can be determined. %\gabrieltodo{Needs to be split into two cases.}
% Note that, 
In contrast to \citep{MEA-EB-PK-AL:20}, where perception functions are assumed to be piecewise linear and encoded as constraints, unfolding the game tree allows us to treat NNs outside the optimisation problem.

% Next, we will discuss how to compute an SW-SPE given an initial state. The problem of computing an SW-SPE is addressed by first constructing a game tree and then encoding the SPE constraints into an MIQCP. 

% \martatodo{What if discounting is considered?}

% \ruitodo{Essentially, our algorithms also apply to the discounting cases.}



% As for the SPE constraints, we will first encode the deterministic transition function $tr$ to an MILP. We assume that the local transition functions $tr_i$ and $\delta_E$, and the observation functions $obs_i$ are all PWL. Then, the global transition function $tr$ is PWL. Each implication clause $f(x)\leq 0\Rightarrow g(x)\leq0$ (where $f(x)$ and $g(x)$ are linear expressions) can be encoded with the following MILP constraints:
% \begin{equation*}
%     f(x)\ge -M\gamma+{\color{red}\frac{M}{2}},\qquad g(x)\leq M(1-\gamma),
% \end{equation*}
% where $\gamma$ is a binary variable and $M$ is a sufficiently large positive constant. Thus, if $f(x)\leq0$, then $\gamma=1$, which gives $g(x)\leq0$ by the second clause. Thus, by taking each linear interval as an implication clause, the PWL global transition function $tr$ starting at path $\pi\in P_{[K],s}$ can be encoded by a set of MILP constraints $C_{tr}(s^{\pi},\{s^{\pi,\alpha}\}_{\alpha\in A})$, where for a joint action $\alpha\in A$, $s'=tr(s,\alpha)$ iff there is an assignment $\mathfrak{a}$ to $vars(C_{tr}(s^{\pi},\{s^{\pi,\alpha}\}_{\alpha\in A}))$ such that $s=\mathfrak{a}(s^{\pi})$, $s'=\mathfrak{a}(s^{\pi,\alpha})$ and $\mathfrak{a}\models C_{tr}(s^{\pi},\{s^{\pi,\alpha}\}_{\alpha\in A})$.

% \martatodo{Add some brief intuition for how the constraints are derived}
% \martatodo{Explain where the neural components are dealt with}
% \ruitodo{Gabriel, please can you add the explanation here}

%As for the constraints, we will first encode subgame perfection as a nonlinear program. An SPNE (SPCE, resp.) of the original game is an NE (CE, resp.) of every subgame of itself, so for each history $h\in H_s^{<K}$ it can be encoded as follows: for SPNE, we have:
We encode subgame perfection as a nonlinear program. An SPNE of the original game is an NE of every subgame, i.e., for each history $h\in H_s^{<K}$, it can be encoded as follows \footnote{To simplify notation, $a_i \in A_i$ refers to $a_i \in A_i(last(h))$ in \eqref{eq:computation-NE} and \eqref{eq:computation-CE}, and similarly for $a_j$ and $a_i'$.}:
\begin{equation}\label{eq:computation-NE}
    \begin{aligned}
\!\!\!\!V^h_i{-}\mbox{$\sum\nolimits_{(a_i,a_j) \in A_i {\times} A_j}$} \mu^h_i(a_i) \cdot \mu^h_j(a_j) \cdot \mgame_i^{h,(a_i,a_j)} &= 0 \\
\!\!\!\!V^h_i{-}\mbox{$\sum\nolimits_{a_j \in A_j}$} \mu^h_j(a_j) \cdot \mgame^{h,(a_i,a_j)}_i \geq 0,\ \forall a_i &\in A_i\\
\!\!\!\!\mbox{$\sum\nolimits_{a_i \in A_i}$} \mu^h_i(a_i) = 1,\ \mu^h_i(a_i) &\geq 0
    \end{aligned}
\end{equation}
for $i,j \in \{1,2\}, i \neq j$, where $\mu_i^h \in \mathbb{P}(A_i(last(h)))$, $V^h = (V^h_1, V^h_2) \in \mathbb{R}^2$ denotes the expected accumulated reward vector from $h$ to the end of the game, and $\mathsf{Z}^{h,\alpha}_i$ denotes the expected accumulated reward to be received by $\agent_i$ after executing the joint action $\alpha$ at $h$. 
% \marta{say where nonlinearity is} \rui{added}
In an SPCE, no agent can gain by deviating from the recommendation in any given history, and thus we have:
\begin{equation}\label{eq:computation-CE}
    \begin{aligned}
V_i^h{-}\mbox{$\sum_{\alpha \in A}$} \mu_{\alpha}^h \cdot \mgame_i^{h,\alpha}& = 0 \\
\mbox{$\sum\nolimits_{a_j \in A_j}$}(\mgame_i^{h,(a_i, a_j)}{-}\mgame_i^{h, (a'_i, a_j)}) \cdot \mu^h_{(a_i, a_j)} &\geq 0 \\
\mbox{$\sum\nolimits_{\alpha \in A}$} \mu^h_{\alpha} = 1, \quad 
\mu^h_{\alpha} & \ge 0
    \end{aligned}
\end{equation}
where 
% $V^h_i(\alpha) = \mgame_i^{h,\alpha}$,
$i,j \in \{1,2\}$, $i \neq j$, %joint action $\alpha \in \Delta(last(h))$, 
$a_i, a'_i \in A_i$,
% , where $A_{-i} = \{\alpha_{-i}\ |\ \alpha \in \mathbb{P}(last(h)\}$. 
$\mu^h = \{\mu_\alpha^h\}_{\alpha \in A}$ and $\mu_\alpha^h$ represents the probability of the joint action $\alpha$ being recommended at $h$.
% a joint strategy corresponding to the correlated profile selecting the joint action $\alpha$ at $h$.

% \ruitodo{Modified. Please check. Also in order to align with \eqref{eq:computation-NE}, $-i$ is replaced by $j$}

% \ruitodo{I'm a little confused about the definition of $V^h_i(\alpha)$. It's unclear what $\alpha{_i}[a_i]$ means}

% \ruitodo{Gabriel, please can you add some words here explaining how to transfer the original polynomial program into a quadratic program in our implementation?}

% \rev{We slightly abuse the notation and write $V_i^h$ instead of $V^h(i)$ and $\mathsf{Z}^{h,(a_i,a_j)}_i$ instead of $\mathsf{Z}^{h,(a_i,a_j)}(i)$.} 


% Finding NE in two-agent nonzero-sum finite normal-form games can be addressed by a linear complementarity problem (LCP) Specifically, the feasible solutions of the LCP are exactly the NE of the game.  \cite{YS-KLB:08}.
% \gabriel{might be worth looking at \cite{SGC05} for different MIP encodings for one-shot games.}
% \begin{equation}\label{eq:SPE-constraints}
%     \begin{aligned}
%          &V^{h}(1)=\sum_{a_2\in A_2}\mu_2^{h}(a_2)\mathsf{Z}^{h,(a_1,a_2)}(1)+\lambda_{a_1}^{h},\quad \forall a_1\in A_1,\\
%          &V^{h}(2)=\sum_{a_1\in A_1}\mu_1^{h}(a_1)\mathsf{Z}^{h,(a_1,a_2)}(2)+\lambda_{a_2}^{h},\quad \forall a_2\in A_2,\\
%          & \sum_{a_1\in A_1}\mu_1^{h}(a_1)=1, \quad \sum_{a_2\in A_2}\mu_2^{h}(a_2)=1,\\
%          & \mu_1^{h}(a_1)\ge0,\,\lambda_{a_1}^{h}\ge0,\, \lambda_{a_1}^{h}\mu_1^{h}(a_1)=0,\hspace{1.5em}  \forall a_1\in A_1,\\
%          & \mu_2^{h}(a_2)\ge0,\,\lambda_{a_2}^{h}\ge0,\, \lambda_{a_2}^{h}\mu_2^{h}(a_2)=0,\hspace{1.5em}  \forall a_2\in A_2,
%     \end{aligned}
% \end{equation}
% where $\mathsf{Z}^{h,(a_1,a_2)}(i)$ denotes the expected accumulated reward to be received by agent $i$ after executing the joint action $(a_1,a_2)$ at $h$. The encoding \eqref{eq:SPE-constraints} is a set of QCP constraints over the variables $\mu_1^{h}$, $\mu_2^{h}$, $V^{h}$, $\mathsf{Z}^{h,(a_1,a_2)}$, $\lambda_{a_1}^{h}$ and $\lambda_{a_2}^{h}$. After omitting the internal variables $\lambda_{a_1}^{h}$ and $\lambda_{a_2}^{h}$, we denote by $C_{\textup{NE}}^h(\mu_1^{h},\mu_2^{h},V^{h},\{\mathsf{Z}^{h,\alpha}\}_{\alpha\in A})$ the set of QCP constraints in \eqref{eq:SPE-constraints}.

% \gabrieltodo{Is there a particular reason why this encoding was preferred instead of that of the FMSD paper or one in \cite{SGC05}? I think it could be alternatively encoded as:}

% {\color{orange}

% \begin{equation*}
%     \begin{aligned}
% V^h_i - \sum\limits_{(a_i,a_j) \in A} \mu^h_i(a_i) {\cdot} \mu^h_j(a_j) {\cdot} \mgame_i^{h,(a_i,a_j)} &= 0 \\
% V^h_i - \sum\limits_{a_j \in A_j} \mu^h_j(a_j) \cdot \mgame^{h,(a_i,a_j)}_i \geq 0, \forall a_i &\in A_i\\
% \sum_{a_i \in A_i} \mu^h_i(a_i) = 1, \mu^h_i(a_i) &\geq 0
%     \end{aligned}
% \end{equation*}

% With the notation $V^h_i = V^h(i), \mgame^{h,(a_i,a_j)}_i = \mgame^{h,(a_i,a_j)}(i)$, without the need of the $\lambda$ variables and the corresponding constraints, $\forall i,j \in \{1,2\}, i \neq j$.

% }

% \ruitodo{I agree with you on the encoding with a smaller number of variables. I thought that the encoding in \eqref{eq:SPE-constraints} is quadratic before. However, as your simpler encoding indicates, the constraints are essentially nonlinear (or polynomial of order higher than two) due to the existence of $\mathsf{Z}$. So, we should use the encoding you referred to.}

% \martatodo{There is a lot of notation, not easy to follow}

% We assume that the reward functions $r_i^A$ and $r_i^S$ for $i\in N$ are linear. 

The SPNE and SPCE imply that, for each $h\in H_s^{<K}$ and $\alpha\in A(last(h))$, the reward for $\agent_i$ satisfies:
\begin{equation}\label{eq:reward-constraints}
\begin{aligned}
     \mathsf{Z}^{h,\alpha}_i=\ \ & r_i^A(last(h),\alpha)+r_i^S(last(h))\\
     &+\mbox{$\sum\nolimits_{h'\in \textup{Succ}(h)}$}\delta(last(h),\alpha)(last(h'))V^{h'}_i
\end{aligned}
\end{equation}
where, for each history $h\in H_s^K$, we take the reward vector $V^h=(r_1^S(last(h)),r_2^S(last(h)))$.
 For each $h\in H_s^{<K}$, let $C^{\textup{N}, h}(\mu_1^{h},\mu_2^{h},V^{h},\{V^{h'}\}_{h'\in \textup{Succ}(h)})$ be the union of constraints \eqref{eq:computation-NE} and \eqref{eq:reward-constraints} (for Nash equilibria), and  $C^{\textup{C}, h}(\mu^{h},V^{h},\{V^{h'}\}_{h'\in \textup{Succ}(h)})$ be the union of constraints \eqref{eq:computation-CE} and \eqref{eq:reward-constraints} (for correlated). The union of $C^{\textup{N},h}$ for all such histories is denoted by $C^{\textup{N}}(\mu^{\textup{N}},V)$ and the union of $C^{\textup{C},h}$ by $C^{\textup{C}}(\mu^{\textup{C}},V)$, where $\mu^{\textup{N}}:=\{\mu_1^h,\mu_2^h\}_{h\in H_s^{<K}}$, $\mu^{\textup{C}}:=\{\mu^h\}_{h\in H_s^{<K}}$ and $V:=\{V^h\}_{h\in H_s^{<K}}$. Note that $C^{\textup{N}}(\mu^{\textup{N}},V)$ ($C^{\textup{C}}(\mu^{\textup{C}},V)$, resp.) is polynomial in $\mu^{\textup{N}}$ ($\mu^{\textup{C}}$, resp.) and $V$, and is nonlinear as $\mathsf{Z}^{h,\alpha}_i$ is related to variables $V_i^{h'}$ for $h' \in \textup{Succ}(h)$.

% We denote by $C_{r}^h(\{\mathsf{Z}^{h,\alpha}\}_{\alpha\in A},\{V^{h'}\}_{h'\in\textup{Succ}(h)})$ the set of linear constraints in \eqref{eq:reward-constraints} for all $\alpha\in A$.

% where for path $\pi\in P_{k,s}$ and join action $(a_1,a_2)\in\Delta(s')$, $F_{\pi}^{a_1,a_2}$ is the set of successor paths $\pi'\in P_{k-1,s}$ such that we have $\delta(s',(a_1,a_2))(last(\pi'))>0$, and for any $\pi\in P_{0,s}$, let $V_{\csg}^{\pi}=0_{1\times 2}$. Also note that $P_{K,s}=\{s\}$.

% $\pi'$ is the successor path of path $\pi$ under the joint action $(a_1,a_2)$, that is, $\pi'=\pi\xrightarrow{(a_1,a_2)}s^{\pi'}$. 


%denotes the expected accumulated reward to be received by agent $i$ after executing the joint action $(a_1,a_2)$ at state $s'$.

% \textbf{Transition function encoding.} We will encode the deterministic transition function $\delta$ to an MILP. We assume that the local transition functions $\delta_i$ and $\delta_E$, and the observation functions $obs_i$ are all PWL. Then, the global transition function $\delta$ is PWL. Each implication clause $f(x)\leq 0\Rightarrow g(x)\leq0$ (where $f(x)$ and $g(x)$ are linear expressions) can be encoded with the following MILP constraints:
% \begin{equation*}
%     f(x)\ge -M\gamma+{\color{red}\frac{M}{2}},\qquad g(x)\leq M(1-\gamma),
% \end{equation*}
% where $\gamma$ is a binary variable and $M$ is a sufficiently large positive constant. Thus, if $f(x)\leq0$, then $\gamma=1$, which gives $g(x)\leq0$ by the second clause. Thus, by taking each linear interval as an implication clause, the PWL global transition function $\delta$ can be encoded by a set of MILP constraints $C(\bm{x},\bm{a},\bm{y})$, where for two global states $s\in S$, $s'\in S$ and a joint action $\alpha\in A$, $s'=\delta(s,\alpha)$ iff there is an assignment $\mathfrak{a}$ to $vars(C(\bm{x},\bm{a},\bm{y}))$ such that $s=\mathfrak{a}(\bm{x})$, $\alpha=\mathfrak{a}(\bm{a})$, $s'=\mathfrak{a}(\bm{y})$ and $\mathfrak{a}\models C(\bm{x},\bm{a},\bm{y})$.

\begin{thom}[Computation of SW-SPNE and SW-SPCE]\label{thom-SW-SPE-D}For a two-agent NS-CSG $\csg$ with an initial state $s\in  S$,
\begin{enumerate}[label=(\roman*)]
    \item\label{itm:spne} a strategy profile $\sigma$ is an SPNE %iff there exists a feasible 
    iff there is a 
    solution of the constraints $C^{\textup{N}}(\mu^{\textup{N}},V)$ such that $\sigma_1(h)=\mu_1^{h}$ and $\sigma_2(h)=\mu_2^{h}$ for each $h\in H_s^{<K}$;
    
    \item\label{itm:spce} a correlated profile $\tau$ is an SPCE %iff there exists a feasible 
    iff there is a solution of the constraints $C^{\textup{C}}(\mu^{\textup{C}},V)$ such that $\tau(h)=\mu^{h}$ for each $h\in H_s^{<K}$;
    
    %\item\label{itm:spe} a strategy profile $\sigma$ is an SPE iff there exists an assignment $\mathfrak{a}$ to $vars(C(\mu,V))$ such that $\sigma_1(h)=\mathfrak{a}(\mu_1^{h})$ and $\sigma_2(h)=\mathfrak{a}(\mu_2^{h})$ for each history $h\in H_s^{<K}$ and $\mathfrak{a}\models C(\mu,V)$;
    
    \item\label{itm:nonlinear-program-spne} a strategy profile $\sigma$ is an SW-SPNE %iff there exists 
    iff there is an optimal solution $(\mu^*,V^*)$ of the nonlinear program:
    \begin{equation}\label{eq:MIQCP-Dete-ne}
    \begin{aligned}
       \underset{\mu^{\textup{N}}, \, V}{\textup{max}}&& 
       \mbox{$\sum\nolimits_{i\in N}$} V^s_i \qquad \textup{subject to}&& C^{\textup{N}}(\mu^{\textup{N}},V)
    \end{aligned}
    \end{equation}
    such that 
    % \marta{notation: should there be a comma between * and h, as used later?}\rui{fixed} 
    $\sigma_1(h)=\mu_1^{*,h}$ and $\sigma_2(h)=\mu_2^{*,h}$ for each $h\in H_s^{<K}$, and the social welfare $W_{0,s}^{\sigma}$ is equal to the optimal value $\sum_{i\in N}V^{*,s}_i$;
    
    \item\label{itm:nonlinear-program-spce} a correlated profile $\tau$ is an SW-SPCE %iff there exists 
    iff there is an optimal solution $(\mu^*,V^*)$ of the nonlinear program:
    \begin{equation}\label{eq:MIQCP-Dete-ce}
    \begin{aligned}
       \underset{\mu^{\textup{C}}, \, V}{\textup{max}}&& 
       \mbox{$\sum\nolimits_{i\in N}$} V^s_i \qquad \textup{subject to}&& C^{\textup{C}}(\mu^{\textup{C}},V)
    \end{aligned}
    \end{equation}
    such that $\tau(h)=\mu^{*,h}$ for each $h\in H_s^{<K}$, and the social welfare $W_{0,s}^{\tau}$ is equal to the optimal value $\sum_{i\in N} V^{*,s}_i$.
\end{enumerate}

% \gabrieltodo{If we want to also be able to minimise the value of an objective, i.e. minimise time in the parking game example, social cost would has to be introduced or the rewards for the examples that we'll want to do that for have to be negated.}

% \ruitodo{Yes, I agree. The easiest way for us is to negate the rewards as you suggested. I have reassigned the reward values.}

% The encoding $C_i(\bm{x},\bm{a},\bm{y})$ induced by $\delta_i$ is an MILP. Given a local state $s_i\in S_i$, a joint action $\alpha\in A$ and a private state $prv_i'\in Prv_i$, then $\delta_i(s_i,\alpha)(prv_i')>0$ iff there is an assignment $\mathfrak{a}$ to $vars(C_i(\bm{x},\bm{a},\bm{y}))$ such that $s_i=\mathfrak{a}(\bm{x})$, $\alpha=\mathfrak{a}(\bm{a})$, $prv_i'=\mathfrak{a}(\bm{y})$ and $\mathfrak{a}\models C_i(\bm{x},\bm{a},\bm{y})$.
\end{thom}
% \begin{proof}
% We postpone to the proof to \appxref{sec:appendix-a}.
% \end{proof}

% \martatodo{Add formal statement of complexity}

%Problem \eqref{eq:MIQCP-Dete-ne} has at most $(|A_1|+|A_2|+2)v$ variables and $(2|A_1||A_2|+2|A_1|+2|A_2|+4)v$ constraints, and \eqref{eq:MIQCP-Dete-ce} has at most $(|A_1||A_2|+2)v$ variables and $(|A_1||A_2| + |A_1|^2 + |A_2|^2 - |A_1| -|A_2| + 3)v$ constraints, where $v$ is the number of non-leaf nodes in the generated game tree and $v=\big((|A_1||A_2||S_1||S_2|)^{K}-1\big)/(|A_1||A_2||S_1||S_2|-1)$ in the worst case.
Although our goal here is to work with NNs, the computation of SW-SPNE and SW-SPCE in \thomref{thom-SW-SPE-D} also applies to conventional stochastic games, because the game tree construction can work for general transition functions with finite branching. The fact that our approach is not limited to NNs (or NNs of a certain class) is an advantage, and allows us to avoid the scalability issues suffered by the method of \citep{MEA-EB-PK-AL:20}, which represents a ReLU neural network as a set of constraints.


%\martatodo{Don't understand - there are no restrictions on the transition functions?} \ruitodo{Yes, the only restriction is that each state has a finite number of successor states.} 

% \ruitodo{Fixed. Please check}



\startpara{Frozen Subgame Improvement} Nonlinear programs in Theorem \ref{thom-SW-SPE-D} can be used to find an SW-SPNE or SW-SPCE efficiently for a small joint action profile and a short horizon. For larger problems, %a large joint action profile or a long horizon,
scalability is an issue because the numbers of variables and constraints are both exponential. To deal with this, we propose an approximation algorithm called \emph{Frozen Subgame Improvement (FSI)} (\algoref{alg:FSI}) that trades optimality for scalability.  %we make a tradeoff between the optimality and scalability. In this section, we will propose an approximation algorithm called \emph{Frozen Subgame Improvement (FSI)} in \algoref{alg:FSI}.  

\begin{algorithm}[h]
	\caption{Frozen Subgame Improvement (FSI)}
	\textbf{Input:} NS-CSG $\csg$, reward $r$, equ.~type $\equilibrium$, init.~state $s$, $m_{\text{max}}$ 
	
	\textbf{Output:} an equilibrium $\mu$, equilibrium payoff vector $V$
	\begin{algorithmic}[1]
	    \State $(\mu,V)\gets\mathsf{GENERALIZED\_BI}(\csg,r,\equilibrium,,s)$
	    \State $m \gets0$
	    \Repeat
	    \State $h\gets \mathsf{A\_HISTORY}(H_s^{<K},\mu,V)$
	    \State $P\gets$ \eqref{eq:MIQCP-Dete-ne} or \eqref{eq:MIQCP-Dete-ce} (depending on $\equilibrium$) after freezing $\mu^{h'},V^{h'}$ for each history $h'\in H_s^{<K}$ that is not a prefix of $h$ (say $h \in H_s^{\ell}$ for some $\ell<K$); 
	    \State $\{\mu^{*,h_{\leq \bar{\ell}}},V^{*,h_{\leq \bar{\ell}}}\}_{\bar{\ell} \leq \ell}\gets \mathsf{NP\_SOLVER}(P)$
	    \State $\mu\gets\{\mu^{*,h_{\leq \bar{\ell}}}\}_{\bar{\ell}\leq\ell}\cup\{\text{the frozen }\mu^{h'}\}$
	    \State $V\gets\{V^{*,h_{\leq \bar{\ell}}}\}_{\bar{\ell} \leq \ell}\cup\{\text{the frozen }V^{h'}\}$
	    \State $m \gets m +1$
	    \Until{$m = m_{\text{max}}$}
	    \State \Return{$\mu, V$}
	\end{algorithmic}
	\label{alg:FSI}
\end{algorithm}


 The main idea of FSI is as follows. First, GBI is used to find a feasible solution to \eqref{eq:MIQCP-Dete-ne} or \eqref{eq:MIQCP-Dete-ce} depending on the equilibrium type $\equilibrium \in \{ \textup{CE}, \textup{NE} \}$, i.e., an SPNE or SPCE. Then, a history $h\in H_s^{<K}$ is selected, for example by sampling uniformly.  %by following a \marta{not clear what pattern} \rui{added} specific pattern (two patterns are introduced later). 
 We freeze the distributions over (joint) actions and equilibrium payoffs corresponding to the histories that are not prefixes of $h$. Thus, \eqref{eq:MIQCP-Dete-ne}, and similarly \eqref{eq:MIQCP-Dete-ce}, can be simplified into a nonlinear program with a smaller number of variables and constraints. Finally, a new solution is computed by merging the frozen part of the current solution and an optimal solution of the simpler nonlinear program. The process performs a predefined number $m_{\text{max}}$ of iterations.

% and also freeze the action distributions of some heuristically selected agent corresponding to the histories which are prefixes of $h$
%
% \martatodo{Discuss how to select the history}
%
%\martatodo{This is confusing and needs to be presented better - FSI actually uses NE computation via GI, so a different algorithm from Algo 1}
In~\algoref{alg:FSI}, $\mathsf{GENERALIZED\_BI}(\cdot)$ computes an SPNE or SPCE $\mu$ and the associated equilibrium payoff vector $V$ by adopting a simpler version of \algoref{alg-BI-SWNE}, in which an NE or CE is computed at step 7 instead of an SWNE or SWCE. $\mathsf{A\_HISTORY}(\cdot)$ returns a history. Here, we sample a history from $H_s^{K-1}$ uniformly; an alternative is presented in Appendix.
$\mathsf{NP\_SOLVER}(\cdot)$ computes an optimal solution to a given nonlinear program.

%\marta{I think it only transpires later that there are two variants of FSI, state based and region based - this needs to be introduced more clearly} \rui{we only have one version. FSI over regions is more like a compromise}
For FSI, we have the following results:

% \marta{this is not clear} \ruitodo{Gabriel, please can you implement the second approach to find a history, although these two approaches will probably produce the similar results. At least we can compare them in the experiments.} 

\begin{thom}[FSI]\label{thom:performance-FSI}
If FSI is adopted to solve \eqref{eq:MIQCP-Dete-ne} (\eqref{eq:MIQCP-Dete-ce}, resp.) approximately, then:
\begin{enumerate}[label=(\roman*)]
    \item the pair $(\mu,V)$ is a feasible solution to \eqref{eq:MIQCP-Dete-ne} (\eqref{eq:MIQCP-Dete-ce}, resp.) at the end of each iteration $m$, that is, $\mu$ is an SPNE (SPCE, resp.) and $V$ is the equilibrium payoff vector;
    
    \item the social welfare $\sum_{i\in N}V^s_i$ is monotonically increasing in $m$, and also monotonically increasing in $m_{\text{max}}$.
\end{enumerate}
\end{thom}

% \begin{pro}[Improvement for different histories]
% During each iteration in the FSI, let $V$ and $V'$ be the equilibrium payoffs if $h$ and $h'$ are given by $\mathsf{A\_HISTORY}(\cdot)$, respectively. Then, $\sum_{i\in N}V^s(i)\ge \sum_{i\in N}V'^s(i)$ if $h'$ is a prefix of $h$. 
% \end{pro}



% and freezing $\mu_i^{h'}$ for each $h'\in H_s$ which is a prefix of $h$ for some heuristically selected agent $i$






% We do need to propose an algorithm which can compute an SW-SPE efficiently, because the optimization problems proposed above are intractable for a large joint action profile or a long horizon. Actually, the variables and constraints in these optimization problems are both exponential, and the constraints are quadratic. 

% I will try fixed-point theorems for the equilibrium payoff correspondence, and try to find a non-trivial bound for the social welfare and the associated equilibrium.

% We refer to partial observable stochastic games as stochastic games in which each agent can only access part of states at each step, i.e., partial observability. 


% Given $\sigma_1$, the computation of $\mathcal{BR}(\sigma_1)$ is equivalent to finding an optimal strategy in an integrated stochastic environment composed of the original environment and agent $1$. This integrated environment is an MDP with possibly continuous and high-dimensional states. However, the observation function for agent $2$ filters the unnecessary information in the environment state and only uses what the agent cares, like the states involved in the reward structure, as its percepts. Thus, the percets can be discrete and low-dimensional, because the strategy of agent $2$ often use part of the whole states.  

% \begin{figure*}[htp]
%     \centering
%     \includegraphics[width=180mm,height=80mm]{figure/MDP.pdf}
%     \caption{Given $\sigma_E$ and $\sigma_1$, the partition of the system when computing the set $\mathcal{BR}(\sigma_{1})$ of best responses of agent $2$ to $\sigma_1$.}
%     \label{fig:MDP}
% \end{figure*}

% \section{Complexity Analysis}
% The biggest challenge we are facing now is how to solve the induced optimization problems, because these problems are computationally intractable due to the following reasons:
% {\color{red}

% \begin{enumerate}
%     \item the constraints are non-linear, i.e., there exist quadratic constraints;
    
%     \item since we consider SPE, the strategy should depend on the memory or path, while the memoryless or state-based strategy fails in this case. However, the number of paths is exponential, while the number of local states in our paper is relatively small.
    
%     \item for the computation of SW-SPE, we have to solve an optimization problem with quadratic constraints and a huge number of variables which can be reals or integers.  
    
%     \item given an initial state, we can compute the state for each node in the game tree by forward induction, implying that the observation functions can be any functions. Actually, this is a model-free method, making our abstraction model no sense, because the characteristics of this model have no contribution to reduce the computation burden.
    
%     \item I wonder we need to reconsider the selection of the types of strategies and equilibria.
    
%     \item this work \cite{LRTZ06} could be helpful for the computation of SW-SPE. This work focuses on two-player turn-based games.
% \end{enumerate}

% }

% At each iteration, a simpler polynomial programming with at most $(|A_1|+|A_2|+2)K$ variables and $(2|A_1||A_2|+2|A_1|+2|A_2|+6)K$ constraints is solved. 

% Before discussing the optimality of the FSI, we introduce the generic games \cite{JCH:73,RDM-AM:97,SG-RW:12,MP-TH-JDF:19}. 

% \martatodo{What can we say about optimality (is this a local optimum)? complexity?}

% \begin{defi}[Generic game]
% A normal-form game is \emph{generic} if a small change of any one of the payoffs does not introduce new NEs or remove existing ones.
% \end{defi}

% It has been proved in \cite{JCH:73} and \cite{RDM-AM:97} that all generic games have finitely many NEs. A sufficient and easily verifiable condition for a game to be generic is that an agent is never indifferent between its pure strategies with respect to its payoffs.

% \begin{thom}[Optimality]
% If $\tau_{\textup{max}}$ is large enough, then the output $(\sigma,V,W_{0,s}^{\sigma})$ has the following properties
% \begin{enumerate}[label=(\roman*)]
%     \item for any history $h\in H_s^{K-1}$, $W_{0,s}^{\sigma}$ is the optimum of \eqref{eq:MIQCP-Dete} when $\mu^{h'}$ and $V^{h'}$ are frozen for all non-prefixes $h'\in H_s^{<K}$ of $h$;
%     \item furthermore, if the normal-form game induced by $V$ at each history $h\in H_s^{<K}$, by which $\sigma(h)$ is computed, is generic, then $W_{0,s}^{\sigma}$ is a local optimum of \eqref{eq:MIQCP-Dete}. 
% \end{enumerate}
% \begin{proof}
% Regarding $(i)$, it directly follows from \lemaref{thom:performance-FSI} and the boundedness of the optimum of \eqref{eq:MIQCP-Dete}.  

% Regarding $(ii)$, we denote by $\Gamma^h$ the normal-form game induced by $V$ at each history $h\in H_s^{<K}$, and $E(\Gamma^h)$ the set of NEs of $\Gamma^h$. Since $\Gamma^h$ is generic, it follows from \cite[Theorem 2.6.2]{EVD:91} that $\sigma(h)$ is isolated, that is, there exists a neighborhood $B$ of $\sigma(h)$ such that $B\cap E(\Gamma^h)=\{\sigma(h)\}$. Thus, $\sigma$ is isolated, implying that it is a local optimum of \eqref{eq:MIQCP-Dete}.
% \end{proof}


% \end{thom}

\input{figures/tex/FSI_with_region}

\startpara{FSI over Regions} If each agent has a limited memory and takes actions conditioned on the current state and stage, we can unfold the game into a graph where each node in a stage represents one reachable state exactly in that stage, as in Fig.~\ref{fig:FSI_region}. With respect to the game tree, the number of nodes in this graph is greatly decreased if many states are frequently visited in a stage. %\marta{Not sure where the modified FSI is}
% A modified FSI is proposed for this graph by first sampling a history (Fig. \ref{fig:FSI_region}: left) and then optimising over a region of states %which can reach the history 
The FSI can be directly adapted to this graph by first sampling a history (Fig. \ref{fig:FSI_region}: left) and then optimising over a region of states,
which contain all histories that reach its last state
(Fig. \ref{fig:FSI_region}: right). 

%\marta{Are there any differences between NE and CE computation by FSI?} 
%\rui{Yes, the only difference is the optimisation problem solved at each iteration}

\startpara{Multi-agent}
SW-SPNE and SW-SPCE computation for \emph{multi-agent} ($n{>}2$) NS-CSGs can be performed
by replacing \eqref{eq:computation-NE} or \eqref{eq:computation-CE} with the encoding of NE/CE computation for the induced multi-agent normal-form game at each $h\in H_s^{<K}$.

\startpara{Complexity} We focus here on practical methods to compute equilibria, which depend on the horizon $K$ and the size of the model (specifically the number of actions and agent states), as well as the underlying solution method used to solve either normal form games (at each state, for SWNE or SWCE) or nonlinear optimisation problems (for SW-SPNE or SW-SPCE). 
%Problem \eqref{eq:MIQCP-Dete-ne} has at most $(|A_1|+|A_2|+2)v$ variables and $(2|A_1||A_2|+2|A_1|+2|A_2|+4)v$ constraints, and \eqref{eq:MIQCP-Dete-ce} has at most $(|A_1||A_2|+2)v$ variables and $(|A_1||A_2| + |A_1|^2 + |A_2|^2 - |A_1| -|A_2| + 3)v$ constraints, where $v$ is the number of non-leaf nodes in the generated game tree and $v=\big((|A_1||A_2||S_1||S_2|)^{K}-1\big)/(|A_1||A_2||S_1||S_2|-1)$ in the worst case.
Computing NEs of a normal form game with two %or more 
players is known to be PPAD-complete \citep{CDT09}. For extensive games, it has been proved that finding SPNEs for quantitative reachability objectives of a two-player game is PSPACE-complete \citep{BBG+19}. Computing SWCEs of a normal form game can be done in polynomial time \citep{GZ89}.

From a practical perspective, any method that relies on finding all NEs in the worst case cannot be expected to achieve a running time that is polynomial with respect to the size of the game, as there can be exponentially many equilibria. GBI requires us to compute an SWNE or 
% \marta{should this one be SWCE?}
SWCE for all states that could be reached from a given initial state in $K$ steps. FSI relies on GBI as an initialisation step (\algoref{alg:FSI}, line 1). Furthermore, the optimisation problem defined for computing SW-SPNE in \eqref{eq:MIQCP-Dete-ne} has at most $(|A_1|+|A_2|+2)v$ variables and $(2|A_1||A_2|+2|A_1|+2|A_2|+4)v$ constraints, and for computing SW-SPCE defined in \eqref{eq:MIQCP-Dete-ce} has at most $(|A_1||A_2|+2)v$ variables and $(|A_1||A_2| + |A_1|^2 + |A_2|^2 - |A_1| -|A_2| + 3)v$ constraints, where $v$ is the number of non-leaf nodes in the generated game tree and $v=\big((|A_1||A_2||S_1||S_2|)^{K}-1\big)/(|A_1||A_2||S_1||S_2|-1)$ in the worst case. %To the best of our knowledge, there is no definite complexity class for nonlinear optimisation problems but it is known that finding optimal values for even some of the quadratic instances is NP-hard.
%It is known that finding optimal values even for some of the quadratic instances of nonlinear optimisation problems is NP-hard.

