\section{Introduction}

Stochastic games~\citep{LSS:53} are a well established model for the
formal design and analysis of probabilistic multi-agent systems.
In particular, \emph{concurrent stochastic games} (CSGs)
provide a natural framework for modelling a set of interactive, rational agents operating concurrently
within an uncertain or probabilistic environment.
For finite-state CSGs, algorithms for their solution are known~\citep{AHK07,AM04,CAH13} and,
more recently, techniques and tools for their formal modelling, analysis and verification
have been developed~\citep{MK-GN-DP-GS:21,MK-GN-DP-GS:20-2}
and applied to examples across robotics, computer security and networks.

In more complex scenarios,
for example sequential decision making in continuous-state or mixed discrete-continuous state environments,
CSGs are again a natural formalism for problems such as multi-agent reinforcement learning \citep{RY-XD-ZS-YS-JRM-FB:21,PCA21}.
A recent trend in this setting is the use of neural networks (NNs),
to represent learnt approximations to value functions~\citep{SO-JP-CA-JPH-JV:17}
or strategies \citep{RY-YW-AT-JH-PA-IM:17} for CSGs.
However, the scalability and efficiency of such approaches
are limited when NNs are used to manage multiple, complex aspects of the system.
To overcome this, a further promising direction is the use of \emph{neuro-symbolic} approaches.
These deploy NNs within certain data-driven components of the control problem,
e.g., for perception modules, and traditional symbolic methods
for others, e.g., nonlinear controllers.

In this paper, we work with the recently proposed formalism of
\emph{neuro-symbolic concurrent stochastic games} (NS-CSGs)~\citep{RY-GS-GN-DP-MK:22},
designed to model probabilistic multi-agent systems
comprising neuro-symbolic agents operating concurrently within a shared, continuous-state environment.
In~\citep{RY-GS-GN-DP-MK:22}, the \emph{zero-sum} control problem is considered,
namely to synthesise strategies for one set of agents who are aiming
to maximise their (discounted, infinite-horizon) expected reward,
while the other agents aim to minimise this value.
%
However, in practice, this is limiting:
even for the case of just two coalitions of agents,
they will often have distinct, but not directly opposing goals,
which cannot be modelled in a zero-sum fashion.

To tackle this problem, we work with \emph{equilibria},
defined by a separate, independent objective for each agent.
These are particularly attractive since they ensure stability against deviations
by individual agents, improving the overall system outcomes.
We formalise the equilibrium synthesis problem for NS-CSGs,
considering two distinct variants: \emph{Nash equilibria} (NEs),
which aim to ensure that no agent has an incentive to deviate unilaterally from their strategy,
and \emph{correlated equilibria} (CEs), which allow agent coordination,
e.g., through public signals, and where agents have no incentive to deviate from the resulting actions.
The latter can both simplify strategy synthesis and improve performance.

Our focus is on (undiscounted) \emph{finite-horizon} objectives,
which simplifies the analysis
(note that the existence of infinite-horizon NE for CSGs is an open problem~\citep{PB-NM-DS:14},
and the verification of non-probabilistic infinite-horizon reachability properties for neuro-symbolic games is undecidable~\citep{MEA-EB-PK-AL:20}),
but also has a number of useful applications, e.g. in receding horizon control.
%
Since multiple equilibria may exist,
we target \emph{social welfare (SW)} optimal equilibria,
which maximise the sum of the individual agent objectives.

We also work with \emph{subgame-perfect equilibria} (SPE),
which are equilibria in every state of the game,
ensuring that optimality remains as later states of the game are reached~\citep{MJO:04,LRTZ06,DF-DL:09,DA-BB-Y:20}. 
%
Crucially, we consider \emph{globally optimal} equilibria which,
from a fixed initial state, are optimal over the chosen time horizon.
This is in contrast to techniques for equilibria in finite-state CSGs~\citep{MK-GN-DP-GS:21,KNPS22},
which consider only local optimality at each time step in the finite-horizon setting.

We first adapt (classical) backward induction to NS-CSGs based on local optimality,
but show that it may find an arbitrarily bad SPE.
Then, for a fixed initial state, we show how to compute optimal equilibria
by unfolding the game tree (including invocation of the NN perception function)
and solving a \emph{nonlinear program}.
However, this suffers from limited scalability.
So we then propose \emph{frozen subgame improvement} (FSI),
an approximation algorithm which iteratively solves nonlinear programs
to monotonically improve the social welfare.
%
Our approach is wholly different from the zero-sum  (discounted, infinite-horizon) solution
of NS-CSGs in~\citep{RY-GS-GN-DP-MK:22},
which applies value/policy iteration to finite model abstractions
that rely on assumptions about the functions used to specify the model.

Finally, we implement our  algorithms and evaluate them on two case studies,
a car-parking example and the VerticalCAS (VCAS) aircraft system for collision avoidance,
showing that they are capable of automatically generating equilibria that can improve over zero-sum strategies.

\startpara{Related Work}
%
Several papers have considered verification and synthesis of equilibria for stochastic games~\citep{FM-IM-IS-ET-LA-AC-HL:09,KH-BB:19,DF-ND-CJ-JSD:18,MK-GN-DP-GS:21}, aiming to prove that a game satisfies a given equilibrium-related requirement specification and also to find such an equilibrium. %, as equilibria are probably the most important concept in game theory. 
However, none of these support CSGs whose agents are partly realized via NNs.
The PRISM-games tool \citep{MK-GN-DP-GS:20-2} provides modelling, verification and equilibria synthesis for
(discrete-state) CSGs, including finite-horizon analysis via backward induction,
but for the simpler case of local optimality, as discussed abvove.
\citep{MK-GN-DP-GS:20-2} also includes infinite-horizon $\epsilon$-optimal social welfare Nash equilibria, and \citep{KNPS22} correlated equilibria with two types of optimality conditions,
computed using value iteration, but again only for discrete models.  %\gabrieltodo{Add correlated \citep{KNPS22}}

% ... but the agents in their models are discrete; the equilibria studied \gabrielrev{also include} (infinite-horizon) $\epsilon$-optimal social welfare Nash equilibria
% \martaside{clarify finite vs infinite horizon}
% , and, moreover, the algorithms are distinct, and involve a combination of backward induction and value iteration\martaside{extend/clarify}. \gabrielrev{Finite-horizon equilibria analysis considers a predetermined bound that limits the number of moves agents can make, while infinite-horizon is commonly associated to a \emph{reachability} objective and rely on different assumptions in order to guarantee convergence. These can be related to the underlying game structure, such as \emph{finite} or \emph{almost-surely stopping} games, or the type of equilibria being computed such as \emph{discounted} or \emph{limit-average} rewards. We note, however, that in \cite{MK-GN-DP-GS:20-2} computation of social welfare equilibria only considers decisions made a state level.}

Numerous methods have been proposed to compute SPEs since their introduction in the 1970s \citep{RS:75}. Most of these address the infinite horizon, for which fixed-point algorithms are the most common methods, from operator design for SPE payoff correspondence \citep{DA-BB-Y:20,TB-VB-AG-JR-MB:20,SY-YC-KLJ:17,MK-16,AB-BC:10}, to homotopy methods \citep{PL-CD:20}. For the finite horizon, which we consider here for reasons of decidability, backward induction is a simple and common bottom-up algorithm for finding an SPE efficiently. However, all these approaches fail to identify SW-SPEs over a finite horizon. 
In \citep{LRTZ06}, a polynomial algorithm is proposed for computing optimal SPEs for turn-based games played over trees, which cannot deal with the concurrency in CSGs.

Neuro-symbolic computing has been attracting attention recently, see \citep{DK-11} and the surveys \citep{LL-AG-MG-MPPA-MV:20,LDR-SD-RM-GM:20}. The works of \citep{MEA-EB-PK-AL:20,MEA-EB-PK-AL:20-2} consider neuro-symbolic multi-agent systems represented as neural interpreted systems and study the finite-horizon verification problem for Alternating Temporal Logic, solved through reduction to an MILP problem, but no equilibria properties. The agents are endowed with perception similarly to what we do here, but are not stochastic.
