\newif\ifdraft\drafttrue
\newif\ifcr\crtrue
%\documentclass{uai2022} % for initial submission
\documentclass[accepted]{uai2022} % after acceptance, for a revised
                                    % version; also before submission to
                                    % see how the non-anonymous paper
                                    % would look like
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2022} % ptmx math instead of Computer
                                         % Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2022} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}
\usepackage{comment}
\usepackage{amsfonts}  
\usepackage{nicefrac}      
\usepackage{microtype}     
\usepackage{amsmath}
\usepackage{mathrsfs}
\usepackage{amssymb}
\usepackage{listings}
\usepackage{bm}
\usepackage{comment}
\usepackage{diagbox}
\usepackage{graphicx}
\usepackage{amsthm}
\usepackage{mathabx}
\usepackage{xcolor}
\usepackage{caption}
\usepackage{subfigure}
\usepackage{stmaryrd}
\usepackage{wrapfig}
\usepackage[numbers, sort, square]{natbib}

%Table
\usepackage{array}
\newcolumntype{P}[1]{{\centering\arraybackslash}p{#1}}

\allowdisplaybreaks
%Algorithm stuff
%%
\usepackage{algorithm}
\usepackage{algpseudocode}
\algtext*{EndWhile}% Remove "end while" text
\algtext*{EndFor}
\algtext*{EndIf}% Remove "end if" text
\algnewcommand\INPUT{\item[\textbf{Input:}]}
\algnewcommand\PARAM{\item[\textbf{Parameters:}]}
\algnewcommand\OUTPUT{\item[\textbf{Output:}]}

%
% These are recommended to typeset algorithms but not required. See the subsubsection on algorithms. Remove them if you don't have algorithms in your paper.
%\usepackage{algorithm}
%\usepackage{algorithmic}

%

\newtheorem{lemma}{Lemma}
\newtheorem{assumption}{Assumption}
\newtheorem{proposition}{Proposition}
\newtheorem{definition}{Definition}
\newtheorem{corollary}{Corollary}
\newtheorem{theorem}{Theorem}
\newtheorem{remark}{Remark}

\newcommand{\abs}[1]{\left\lvert#1\right\rvert}
\newcommand{\act}{\mathcal{X}}%action set
\newcommand{\ball}{\mathbb{B}_{\epsilon}} %ball
\newcommand{\bx}{\bm{x}}%action of a player
\newcommand{\bs}{\bm{s}}%strategy of a player
\newcommand{\by}{\bm{y}}%a point
\newcommand{\B}{\mathcal{B}}%neighborhood
\newcommand{\ch}{\textsc{Chd}}%\newcommand{\ch}{\mathit{CHD}}%child
\newcommand{\des}{\textsc{Des}}
\newcommand{\F}{f}%function F
\newcommand{\G}{\mathcal{G}}
\newcommand{\leaf}{\textsc{Leaf}}%\newcommand{\leaf}{\mathit{LEAF}}%leaf
\newcommand{\li}{i}%shorthand for l,i
\newcommand{\nats}{\mathbb{N}}%natural numbers
\newcommand{\norm}[1]{\left\lVert#1\right\rVert}%norm
\newcommand{\pa}{\textsc{Pa}}%\newcommand{\pa}{\mathit{PA}}%parent
\renewcommand{\path}{\textsc{Path}}
\newcommand{\reals}{\mathbb{R}}%real numbers
\renewcommand{\u}{u}%pay-off
\newcommand{\X}{\mathcal{X}}


\newcommand{\sj}[1]{\ifdraft{\color{blue}[Shahin: {#1}]}\fi}
\newcommand{\mc}[1]{\ifdraft{\color{magenta}[Mithun: {#1}]}\fi}
\newcommand{\lz}[1]{\color{red}[Zun: {#1}]}
\newcommand{\yv}[1]{\ifdraft{\color{red}[Eugene: {#1}]}\fi}

%%%For appendix
\newcommand{\lev}{\mathcal{L}}
\newcommand{\actvec}{\overline{\alpha}}
\newcommand{\Cinc}{\mathtt{C}^{\mathrm{inc}}}
\newcommand{\Cdec}{\mathtt{C}^{\mathrm{dec}}}
\newcommand{\Co}{\mathtt{C}}
\newcommand{\Cimp}{\mathtt{C}^{\mathrm{imp}}}
\newcommand{\Cnc}{\mathtt{C}^{\mathrm{NC}}}
\newcommand{\Ii}{I^0}
\newcommand{\Infec}{\mathtt{Infect}}
\newcommand{\Poi}{\mathrm{\textsc{Poisson}}}
\newcommand{\scg}{\mathtt{SC}}
\newcommand{\I}{\mathtt{I}}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{Solving Structured Hierarchical Games Using Differential Backward Induction\ifcr\thanks{The full technical version of this paper is available at~\url{https://arxiv.org/abs/2106.04663}.}\fi}

% The standard author block has changed for UAI 2022 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors







\author[1]{\qquad Zun Li}
\author[2]{\quad Feiran Jia}
\author[3]{\quad Aditya Mate}
\author[4]{\quad Shahin Jabbari}
\author[1]{\quad Mithun Chakraborty}
\author[3]{\quad Milind Tambe}
\author[5]{\qquad Yevgeniy Vorobeychik}
% Add affiliations after the authors
\affil[1]{%
    University of Michigan, Ann Arbor\\
    \texttt{\{lizun,dcsmc\}@umich.edu}
}
\affil[2]{%
    Pennsylvania State University\\
    \texttt{fzj5059@psu.edu}
}
\affil[3]{%
    Harvard University\\
    \texttt{\{aditya\_mate,milind\_tambe\}@g.harvard.edu}
  }
\affil[4]{%
    Drexel University\\
    \texttt{shahin@drexel.edu}
  }  
\affil[5]{%
    Washington University in St. Louis\\
    \texttt{yvorobeychik@wustl.edu }
  }  

\begin{document}
\maketitle

%\input{abstract}
\begin{abstract}
From large-scale organizations to decentralized political systems, hierarchical strategic decision making is commonplace.
We introduce a novel class of \emph{structured hierarchical games (SHGs)} that formally capture such hierarchical strategic interactions.
In an SHG, each player is a node in a tree, and strategic choices of players are sequenced from root to leaves, with root moving first, followed by its children, then followed by their children, and so on until the leaves.
A player's utility in an SHG depends on its own decision, and on the choices of its parent and \emph{all} the tree leaves.
SHGs thus generalize simultaneous-move games, as well as Stackelberg games with many followers. 
We leverage the structure of both the sequence of player moves as well as payoff dependence to develop a gradient-based back propagation-style algorithm, which we call \emph{Differential Backward Induction (DBI)}, for approximating equilibria of SHGs.
We provide a sufficient condition for convergence of DBI and 
demonstrate its efficacy in finding approximate equilibrium solutions to several SHG models of hierarchical policy-making problems.
\end{abstract}

\section{Introduction}\label{sec:intro}
%\input{intro}
The COVID-19 pandemic has revealed considerable strategic tension among the many parties involved in decentralized hierarchical policy-making.
For example, recommendations by the World Health Organization are sometimes heeded, and other times discarded by nations, while subnational units, such as provinces and urban areas, may in turn take a policy stance (such as on lockdowns, mask mandates, or vaccination priorities) that is not congruent with national policies.
Similarly, in the US, policy recommendations at the federal level can be implemented in a variety of ways by the states, while counties and cities, in turn, may comply with state-level policies, or not, potentially triggering litigation~\cite{hill2021public}.
Central to all these cases is that, besides this strategic drama, what ultimately determines infection spread is how policies are implemented \emph{at the lowest level}, such as by cities and towns, or even individuals.
Similar strategic encounters routinely play out in large-scale organizations, where actions throughout the management hierarchy are ultimately reflected in the decisions made at the lowest level (e.g., by the employees who are ultimately involved in production), and these lowest-level decisions play a decisive role in the organizational welfare.

We propose a novel model of hierarchical decision making which is a natural stylized representation of strategic interactions of this kind.
Our model, which we term \emph{structured hierarchical games (SHGs)}, represents each player by a node in a tree hierarchy.
The tree plays two roles in SHGs.
First, it captures the sequence of moves by the players: the root (the lone member of level 1 of the hierarchy) makes the first strategic choice, its children (i.e., all nodes in level 2) observe the root's choice and follow, their children then follow in turn, and so on, until we reach the leaf node players who move upon observing their predecessors' choices. Second, the tree partially captures strategic dependence: a player's utility depends on its own strategy, that of its parent, and the strategies of \emph{all of the leaf nodes}.
The sequence of moves in our model naturally captures the typical sequence of decisions in hierarchical policy-making settings, as well as in large organizations, while the utility structure captures the decisive role of leaf nodes (e.g., individual compliance with vaccination policies), as well as hierarchical dependence (e.g., employee dependence on a manager's approval of their performance, or state dependence on federal funding).
Significantly, the \emph{SHG} model generalizes a number of well-established models of strategic encounters, including (a) simultaneous-move games (captured by a 2-level SHG with the root having a single dummy action), (b) Stackelberg (leader-follower) games (a 2-level game with a single leaf node)~\cite{von1952theory,fiez2020implicit}, and (c) single-leader multi-follower Stackelberg games (e.g., a Stackelberg security game with a single defender and many attackers)~\cite{basilico2016methods,coniglio2020computing}.

Our second contribution is a gradient-based algorithm for approximately computing subgame-perfect equilibria of \emph{SHGs}.
Specifically, we propose \emph{Differential Backward Induction (DBI)}, which is a backpropagation-style gradient ascent algorithm that leverages both the sequential structure of the game, as well as the utility structure of the players.
As \emph{DBI} involves simultaneous gradient updates of players in the same level (particularly at the leaves), convergence is not guaranteed in general (as is also the case for best-response dynamics~\cite{fudenberg1998theory}).
Viewing \emph{DBI} as a dynamical system, we provide a sufficient condition for its convergence to a stable point.
Our results also imply that in the special case of two-player zero-sum Stackelberg games, \emph{DBI} converges to a local Stackelberg equilibrium~\cite{fiez2020implicit,wang2019solving}.

Finally, we demonstrate the efficacy of DBI in finding approximate equilibrium solutions to several classes of SHGs.
First, we use a highly stylized class of SHGs with polynomial utility functions to compare DBI with five baseline gradient-based approaches from prior literature.
Second, we use DBI to solve a recently proposed game-theoretic model of 3-level hierarchical epidemic policy making.
Third, we apply DBI to solve a hierarchical variant of a public goods game, which naturally captures the decentralization of decision making in public good investment decisions, such as investments in sustainable energy.
Fourth, we evaluate DBI in the context of a hierarchical security investment game, where hierarchical decentralization (e.g., involving federal government, industry sectors, and particular organizations) can also play a crucial role.
In all of these, we show that DBI significantly outperforms the state of the art approaches that can be applied to solve games with hierarchical structure.

%\input{related}
\noindent{\bf Related Work } 
SHGs generalize both simultaneous-move games and Stackelberg games
with multiple followers~\citep{leyffer2005solving,basilico2016methods}.
They are also related to \textit{graphical games}~\cite{kearns2013graphical} in capturing utility dependence structure, although SHGs also capture sequential structure of decisions.
Several prior approaches use gradient-based methods for solving games with particular structure. 
A prominent example is generative adversarial networks (GANs), though these are zero-sum games~\citep{goodfellow2014generative,jin2020local,nagarajan2017gradient,daskalakis2018limit,mertikopoulos2019learning,mescheder2017numerics}. Ideas from learning GANs have been adopted in gradient-based approaches to solve multi-player general-sum games~\citep{mazumdar2020gradient,balduzzi2018mechanics,chasnov2020convergence,ibrahim2020linear,letcher2020impossibility,lin2020finite,mertikopoulos2019learning}. However, all of these approaches assume a simultaneous-move game. 
A closely-related thread to our work considers gradient-based methods for bi-level 
optimization~\citep{li1987distributed,shaban2019truncated}. %\citet{fiez2020implicit}~and~\citet{wang2019solving} 
Several related efforts consider gradient-based learning in Stackelberg games, and also use the implicit function theorem to derive gradient updates~\citep{amin2016gradient,fiez2020implicit,nguyen2021partial,wang2019solving,wang21}. 
We significantly generalize these ideas by considering an arbitrary hierarchical game structure.

\citet{jia2021game} recently considered a stylized 3-level SHG for pandemic policy making, and proposed several non-gradient-based algorithms for this problem. 
We compare with their approach in Section~\ref{sec:exp}.

\section{Structured Hierarchical Games}\label{sec:framework}
%\input{model}
\paragraph{Notation}
We use bold lower-case letters to denote vectors. 
Let $\F$ be a function of the form $\F(\bx, \by):\reals^{d}\times\reals^{d'}\rightarrow\reals^{d''}$. We use $\nabla_{\bx}\F$ to denote the partial derivative of $\F$ with respect to $\bx$. When there is functional dependency between $\bx$ and $\by$, we use $D_{\bx}\F$ to denote the total derivative of $\F(\bx, \by(\bx))$ with respect to $\bx$. We use $\nabla^2_{\bx, \bx}\F$ and $\nabla^2_{\bx, \by}\F$ to denote the second-order partial derivatives and $D^2_{\bx, \bx}\F$ to denote the second-order total derivative of $\F$. For a mapping $\F:\reals^d\rightarrow \reals^d$, we use $\F^t(\bx)$ to denote $t$ iterative applications of $\F$ on $\bx$. For mappings $\F_1:\reals^d\rightarrow \reals^d$ and $\F_2:\reals^d\rightarrow \reals^d$, we define $(\F_1\circ \F_2)(\bx)\triangleq \F_1(\F_2(\bx))$ and $(\F_1+ \F_2)(\bx)\triangleq \F_1(\bx)+\F_2(\bx)$.
Moreover, for a given $\epsilon \in \reals^{> 0}$ and $\bx\in \reals^d$, we define the $\epsilon$-ball around $\bx$ as $\ball(\bx)=\{\bx^\prime\in \reals^{d}\mid\|\bx-\bx^\prime\|_2<\epsilon\}$. Finally, 
$\bm{I}$ denotes an identity matrix.

\paragraph{Formal Model}
A structured hierarchical game (SHG) $\G$ consists of the set $\mathscr{N}$ of $n$ players. 
Each player $i$ is associated with a set of actions $\act_{i}\subseteq\mathbb{R}^{d_i}$.
The players are partitioned across $L$ levels, where $\mathscr{N}_l$ is the set of $n_l$ players occupying level $l$.
Let $l_i$ denote the level occupied by player $i$.
This \emph{hierarchical} structure of the game is illustrated in Figure~\ref{fig:model} where players correspond to nodes and levels are marked by dashed boundaries. The hierarchy plays two crucial roles: 1) it determines the order of moves, and 2) it partly determines utility dependence among players.
Specifically, the temporal pattern of actions is as follows: level 1 has a single player, the \textit{root}, who chooses an action first, followed by all players in level 2 making simultaneous choices, followed in turn by players in level 3, and so on until the \emph{leaves} in the final level $L$. 
Players of level $l$ only observe the actions chosen by all players of levels $1,2,...,l-1$, but not their peers in the same level. 
\begin{figure}[ht!]
	\centering
	\includegraphics[width=1\columnwidth]{figures/SHG_seq_new.png}
	\caption{Schematic representation of an SHG. 
	The utility of player 
	$\li$
	can have direct functional dependence \emph{only} on the joint action of \emph{all} shaded players.}
	\label{fig:model}
\end{figure}
So, for example, pandemic social distancing and vaccination policies in the US are initiated by the federal government (including the Centers for Disease Control and Prevention who acts as the root in our game model), with states (second level nodes) subsequently instituting their own policies, counties (third level nodes) reacting to these by determining their own, and behavior of people (leaf nodes) ultimately influenced, but not determined, by the guidelines and enforcement policies by the local county/city.

Next, we describe the utility structure of the game as entailed by the SHG hierarchy.
Each player $i$ in level $l_i > 1$ (i.e., any node other than the root) has a \emph{unique parent} in level $l_i-1$; we denote the parent of node $i$ by $\pa(i)$.
A player's utility function is determined by 1) its own action, 2) the action of its parent, and 3) the actions of \emph{all} players in level $L$ (i.e., all \emph{leaf} players).
To formalize, let $\bx_l$ denote the joint action profile of all players in level $l$.
Player $i$'s utility function then has the form $\u_{i}(x_i, \bx_L)$ if $l_i = 1$, $\u_{i}(x_i, x_{\pa(i)}, \bx_L)$ if $1 < l_i < L$, and $\u_{i}(x_i, x_{\pa(i)}, \bx_{L,-i})$ if $l_i = L$, where $\bx_{L,-i}$ is the action profile of all players in level $L$ other than $i$.
For example, in our running pandemic policy example, the utility of a county depends on both the policy and enforcement strategy of its state (its \emph{parent}) and on the ultimate pandemic spread and economic impact within it, both determined largely by the behavior of the county residents (\emph{leaf nodes}).
Note the considerable generality of the SHG model.
For example, an arbitrary simultaneous-move game is a SHG with 2 levels and a ``dummy'' root node (utilities of all leaves depend on one another's actions), and an arbitrary Stackelberg game (e.g., Stackelberg security game), even with many followers, can be modeled as a 2-level SHG with the leader as root and followers as leaves.
Furthermore, while we have defined SHGs with respect to real-vector player action sets, it is straightforward to represent mixed strategies of finite-action games in this way by simply using a softmax function to map an arbitrary real vector into a valid mixed strategy.

%\input{solution-concept}
\noindent{\bf Solution Concept }
Since an SHG has important sequential structure, it is natural to consider the \emph{subgame perfect equilibrium (SPE)} as the solution concept~\citep{osborne2004introduction}.
Here, we focus on pure-strategy equilibria.
To begin, we note that in SHGs, the strategies of players in any level $l>1$ are, in general, functions of the complete history of play in levels $1,\dots,l-1$, which we denote by $h_{<l}=(\bx_1, \bx_2,\dotsc,\bx_{l-1})$.
Formally, a (pure) strategy of a player $i$ is denoted by $s_i(h_{<l})$, which deterministically maps an arbitrary history $h_{<l}$ into an action $x_i \in \act_i$.
A \emph{Nash equilibrium} of an SHG is then a strategy profile $\bs =(s_{1},\dotsc,s_{i},\ldots,s_n)$ such that for all $i \in \mathscr{N}$, $u_i(s_i,\bm{s_{-i}}) \ge u_i(s_i',\bm{s_{-i}})$ for all possible alternative strategies for $i$, $s_i'$.
Here, we denote the realized payoff of $i$ from profile $\bm{s}$ by $u_i(s_i,\bm{s_{-i}})$.
Next, we define a \emph{level-$l$-subgame} given $h_{<l}$ as an SHG that includes only players at levels $\ge l$, with actions chosen in levels $<l$ fixed to $h_{<l}$.
A strategy profile $\bs$ is a \emph{subgame perfect equilibrium} of SHG if it is a Nash equilibrium of every level-$l$-subgame of SHG for every $l$ and history $h_{<l}$.
We prove in \ifcr the long version \else appendix~\ref{sec:app-spe} \fi that our definition of SPE is equivalent to the standard SPE in an extensive-form representation of SHG.

While in principle we can compute an SPE of an SHG using backward induction, this cannot be done directly (i.e., by complete enumeration of actions of all players) as actions are real vectors.
Moreover, even discretizing actions is of little help, as the hierarchical nature of the game leads to exponential explosion of the search space.
We now present a gradient-based approach for approximating SPE along the equilibrium path in an SHG that leverages the game structure to derive backpropagation-style gradient updates.


\section{Differential Backward Induction}\label{sec:algo}
%\input{solution}
In this section, we describe our gradient-based algorithm,  Differential Backward Induction (DBI), for approximating an SPE (which we mean hereinafter finding a joint-action profile $\bx$ that constitutes a subgame-perfect equilibrium path), and then analyze its convergence.
Just as gradient ascent does not, in general, identify a globally optimal solution to a non-convex optimization problem, DBI in general yields a solution which only satisfies first-order conditions (see Section~\ref{sec:analysis} for further details). 
Moreover, we leverage the structure of the utility functions to focus computation on an SPE in which strategies of players are only a function of their immediate parents.\footnote{Note that while we cannot guarantee that an SPE exists in SHGs in general, let alone those possessing the assumed structure, we find experimentally that our approach often yields good SPE approximations.}

In this spirit, we define \emph{local} best response functions $\phi_{i}:\reals^{d_{\pa(i)}}\rightarrow\reals^{d_i}$ mapping a player $i$'s parent's action $x_{\pa(i)}$ to $i$'s action $x_i$; note that the notation $\phi_i$ is distinct from $s_i$ above for $i$'s strategy to emphasize the fact that $\phi_i$ is only locally optimal.
Now, suppose that a player $i$ is in the last level $L$.
Local optimality of $\phi_i$ implies that if $x_i = \phi_i(x_{\pa(i)})$, then 
$\nabla_{x_{i}}\u_{i}\left(\bx_{i}, \bx_{\pa(i)}, \bx_{L,-i}\right)=0$ and $\nabla^2_{x_{i},x_{i}}\u_{i}\left(\bx_{i}, \bx_{\pa(i)}, \bx_{L,-i}\right) \prec 0.$\footnote{For simplicity, we omit degenerate cases where $\nabla^2_{\bx_{i},\bx_{i}}\u_{i}=0$ and assume all local maxima are strict.}

Let $\phi_{l}$
denote the local best response for all the players in level $l$ given the actions of all players in level $l-1$.
We can compose these local best response functions to define the function $\Phi_{l}:=\phi_{L}\circ\phi_{L-1}\circ\dotsc\circ\phi_{l+1}:\reals^{d_{n_l}}\rightarrow\reals^{d_{n_L}}$ i.e., the local best response of players in the last level $L$ given the actions of the players in level $l$.\footnote{Note that in particular $\Phi_L = \phi_L$.} 
Then for any player $(i)$ in level $l_i < L$, $D_{x_{i}}\u_{i}\left(x_i, x_{\pa(i)}, \Phi_{l}\left(\langle x_{i},\bx_{l,-i}\rangle\right)\right)=0$  and  $D^2_{x_{i},x_{i}}\u_{i}\left(x_{i}, x_{\pa(i)}, \Phi_{l}\left(\langle x_{i},\bx_{l,-i}\rangle\right)\right)\prec0$,
where $D_{x_{i}}$ is the total derivative with respect to $x_{i}$ (as $\Phi_{l}(\langle x_{i},\bx_{l,-i}\rangle)$ is also a function of $x_{i}$). 
Note that the functions $\phi$ and $\Phi$ are \emph{implicit}, capturing the functional dependencies between actions of players in different levels at the local equilibrium.

Throughout, we make the following standard assumption on the utility functions~\citep{dontchev2009implicit, wang2019solving}.
\begin{assumption}
\label{asp:1}
For any $x_{i}\in\X_{i}$, the second-order partial derivatives of the form $\nabla^2_{x_{i}, x_{i}}\u_{i}$ are non-singular.
\end{assumption}

\subsection{Algorithm}

The DBI algorithm works in a bottom-up manner, akin to back-propagation: for each level $l$, we compute the total derivatives (gradients) of the utility functions and local best response maps ($\phi$, $\Phi$) based on analytical expressions that we derive below.
We then propagate this information up to level $l-1$, as it is used to compute gradients for that level, and so on until level 1.
Algorithm~\ref{alg:dbw} gives the full DBI algorithm.
%, which assumes that total derivatives are given.
In this algorithm, $\ch(i)$ denotes the set of children of player $i$ (i.e., nodes in level $l_i+1$ for whom $i$ is the parent).
\begin{algorithm}[ht!]
\begin{algorithmic}
\INPUT{An SHG instance $\G$}
\PARAM{Learning rate $\alpha$, maximum number of iterations $T$ for gradient update}
\OUTPUT{A strategy profile}
\State 
Randomly initialize $\bx^0=\langle \bx_1^0, \ldots, \bx_L^0 \rangle$ 
%\Comment{Initialization} 
\For{$t=1,2,\dotsc, T$}
%\Comment{Loop over number of gradient update iterations}
    \For{$l=L, L-1, \dotsc, 1$} 
    %\Comment{Backward induction loop}
    		\For{$i=1,2,\dotsc, n_l$}
    		%\Comment{Loop over players in the level}
    		\If{$l=L$} 
    		%\Comment{Check for the last-level players}
    		\State Back-propagate 
    		%$\bm{I}$ as 
    		$D_{x_{i}}\Phi_{i} = \bm{I}$ to $\pa(i)$ %\Comment{$\Phi_{i} = \phi_{i}=x_{i}$}
    		\State Set $x^t_{i}\leftarrow x^{t-1}_{i}+\alpha \nabla_{x_{i}}u_{i}$ 
    		%\Comment{Gradient update step}
    		\Else
            \State Compute $\nabla_{x_{i}}u_{i}, \nabla_{\bx_L}u_{i}$ at $\bx^{t-1}$
             \State Compute $D_{x_{i}}\phi_{j}, \forall j \in\ch(i)$ (Eqn.~\eqref{eq:hessian_con})
    	    \State Compute $D_{x_{i}}\Phi_l$ 
    	    %using $D_{x_{j}}\Phi_{j}$ for $j\in \ch(i)$ 
    	     (Eqn.~\eqref{eq:bottom-up})
    	    %propagated below \Comment{ Equation~(\ref{eq:bottom-up})}
    		\State Back-propagate $D_{x_{i}}\Phi_l$ to $\pa(i)$
            \State Compute $D_{x_{i}}u_{i}=\nabla_{x_{i}}u_{i}+\nabla_{\bx_L}u_{i}D_{x_{i}}\Phi_l$
    		    \State Set $x^t_{i}\leftarrow x^{t-1}_{i}+\alpha D_{x_{i}}u_{i}$ %\Comment{Gradient update step}
    		    \EndIf
	  \EndFor
    	      \EndFor
\EndFor
\State Return $\bx^T$ %\Comment{Output after $T$ iterations}
\end{algorithmic}
\caption{Differential Backward Induction (DBI)}\label{alg:dbw}
\end{algorithm}
DBI works in a backward message-passing manner, comparable to back-propagation: after each player has computed its total derivative, it passes (back-propagates) $D_{x_{\li}}\Phi_l$ to its direct parent; this information is, in turn, used by the parent to compute its own total derivative, which is passed to its own parent, and so on.

Algorithm~\ref{alg:dbw} takes the total derivates as given.
We now derive closed-form expressions for these.
We start from the last level $L$. Given the actions of players in level $L-1$, 
the total derivative of a player $i \in \mathscr{N}_L$ with respect to $x_i$ is
\begin{equation}
\label{eq:last-layer}
D_{x_{i}}\u_{i}\left(x_{i}, x_{\pa(i)}, \bx_{L,-i}\right)=\nabla_{x_{i}}\u_{i}.
\end{equation}
For a player $i$ in level $L-1$, the total derivative (at a local best response) is
\begin{align}
 D_{x_{i}}\u_{i}(x_{i}, x_{\pa(i)}, &\phi_{L}(\langle x_{i},\bx_{L-1,-i}\rangle))
 \notag \\
&
=\nabla_{x_{i}}\u_{i}+\left(\nabla_{\bx_{L}}u_{i}\right)\left(D_{x_{i}}\phi_{L}\right), 
\label{eq:before-last-layer}
\end{align}
where $\nabla_{\bx_L}u_{i}$ is a $1\times d_{n_L}$ vector and $D_{x_{i}}\phi_{L}$ is a $d_{n_L}\times d$ matrix.
The technical challenge here is to derive the term
$D_{x_{i}}\phi_{L}$ for $i \in \mathscr{N}_{L-1}$.
%$D_{\bx_{L-1, i}}\phi_{L}$.
Recall that $\phi_{L}$ is the vectorized concatenation of the $\phi_{j}$ functions for $j \in \mathscr{N}_L$.
Since the local best response strategy of a player in level $L$ only depends on its parent in level $L-1$,
 the only terms in $\phi_{L}$ that depend on $x_{i}$ are the actions of $\ch(i)$ in level $L$.
Consequently, it suffices to derive $D_{x_{i}}\phi_{j}$ for  $j\in \ch(i)$. 
Note that for these players $j$, 
$\nabla_{x_{j}}u_{j}=0$ (by local optimality of $\phi_{L}$).
We will use this first-order condition to derive the expression for the total derivative using the \emph{implicit function theorem}.

\begin{theorem}[Implicit Function Theorem (IFT) {\cite[Theorem 1B.1]{dontchev2009implicit}}]
\label{thm:imp}.
Let $\F(\bx_1, \bx_2):\reals^{d}\times \reals^{d}\rightarrow\reals^{d}$ be a continuously differentiable function in a neighborhood of $(\bx_1^*, \bx_2^*)$ such that $\F(\bx_1^*, \bx_2^*)=0$.
Also suppose $\nabla_{\bx_2}f$, the Jacobian of $\F$ with respect to $\bx_2$, is non-singular at $(\bx_1^*, \bx_2^*)$.
Then around a neighborhood of $\bx_1^*$, we have a local diffeomorphism $\bx_2^*(\bx_1): \reals^{d}\rightarrow \reals^{d}$ such that
 $D_{\bx_1}\bx_2=-\left(\nabla_{\bx_2}f\right)^{-1}\nabla_{\bx_1}f$.
\end{theorem}
To use Theorem~\ref{thm:imp}, we set $\F=\nabla_{x_{j}}u_{j}$ (which satisfies the conditions of Theorem~\ref{thm:imp} by Assumption~\ref{asp:1}), $\bx_1=x_{i}$ and $x_2=\bx_{j}$ (recall that $j \in \ch(i)$). 
By IFT, there exists $\phi_j(x_i)$ such that
$D_{{x_{i}}}\phi_{j}=-(\nabla_{x_{j}, x_{j}}^2\u_{j})^{-1}\nabla_{x_{j}, x_{i}}^2\u_{j}.$
Define $\nabla^2_{j} := \nabla_{x_{j}, x_{i}}^2\u_{j}$. Then 
\begin{align*}
\left(\nabla_{\bx_L}\u_{i}\right)\left(D_{x_{i}}\phi_{L}\right)&=-\sum_{j\in \ch(i)}\left(\nabla_{x_{j}}\u_{i}\right)D_{x_{i}}\phi_{j}
\\
&=-\sum_{j\in \ch(i)}\left(\nabla_{x_{j}}\u_{i}\right)(\nabla_{x_{j}, x_{j}}^2\u_{j})^{-1}\nabla^2_{j}.
\end{align*}
Plugging this into Equation~\eqref{eq:before-last-layer}, we obtain
\begin{align}
D_{x_{i}}\u_{i}&\left(x_{i}, x_{\pa(i)}, \phi_{L}\left(\bx_{L-1}\right)\right)\notag \\ &=\nabla_{x_{i}}\u_{i}-\sum_{j\in \ch(i)}\left(\nabla_{x_{j}}\u_{i}\right)(\nabla_{x_{j}, x_{j}}^2\u_{j})^{-1}\nabla^2_{j}. \label{eq:before-last-layer-2}
\end{align}

For a level $l<L-1$, the total derivative of player $i \in \mathscr{N}_l$ in a local best response is
$D_{x_{i}}\u_{i}=\nabla_{x_{\li}}\u_{\li}+\left(\nabla_{\bx_L}\u_{\li}\right)\left(D_{x_{\li}}\Phi_l\right),$
where

\begin{align}
D_{x_{\li}}\Phi_l&=\left(D_{\bx_{l+1}}\Phi_{l+1}\right)\left(D_{x_{\li}}\bx_{l+1}\right)\nonumber\\
&=\sum_{j\in\ch(i)}\left(D_{x_{j}}\Phi_{l+1}\right)\left(D_{x_{\li}}\phi_{j}\right).
\label{eq:bottom-up}
\end{align}
Applying IFT, we get
\begin{equation}
D_{x_{\li}}\phi_{j}=-(\nabla_{x_{j}, x_{j}}^2u_{j})^{-1}\nabla_{x_{j}, x_{\li}}^2u_{j},
\label{eq:hessian_con}
\end{equation}
for $j\in \ch(\li)$.
We can apply the above procedure recursively for $D_{\bx_{l+1}}\Phi_{l+1}$ to derive the total derivative for players $i \in \mathscr{N}_l$ for $l<L-1$:
\begin{align}
\label{eq:before-before-last}
D_{x_{\li}}\u_{\li} = \nabla_{x_{\li}}u_{\li} &+ \left(\sum\limits_{j\in \leaf(\li)}(-1)^{L-l}\nabla_{x_{j}}\u_{\li}\nonumber \right . \\  
&\left . \prod\limits_{\substack{\eta\in 
\path(j \rightarrow \li)}}\left(\nabla^2_{x_{\eta},x_{\eta}}\u_{\eta}\right)^{-1}\nabla^2_{x_{\eta},\bx_{\pa(\eta})}u_{\eta}\right),
\end{align}
where $\path(j\rightarrow i)$ is an ordered list of nodes (players) lying on the unique path from $j$ to $\li$, excluding $\li$. Note that Equation~\eqref{eq:before-before-last} is a generalization of Equation~\eqref{eq:before-last-layer-2} where the $\path$ only consists of the leaf player. 

While the above derivation assumes the $\phi$ and $\Phi$ functions are local best responses, in our algorithm in each iteration we evaluate these functional expressions for the total derivatives \emph{at the current joint action profile}. 
This significantly reduces computational complexity and ensures that Algorithm~\ref{alg:dbw} satisfies the first-order conditions upon convergence.

%\input{convergence}
\subsection{Convergence Analysis}\label{sec:analysis}
As we remarked earlier, stable points of DBI are not guaranteed to be SPE just as stable points of gradient ascent are not guaranteed to be globally optimal with general non-convex objective functions.
Furthermore, DBI algorithm entails what are effectively iterative better-response updates by players, and it is well-known that best response dynamic processes in games will in general lead to cycles~\citep{mazumdar2020gradient}.

In spite of these challenges, we provide sufficient conditions for the DBI algorithm to converge to a stable point. In particular, in the rest of this section, we first show that the gradient updates of DBI can be written as a dynamical system and characterize the conditions in which this system will converge to an stable point (Proposition~\ref{prop:suff}). We then show how DBI can be tuned (in terms of learning rate in Proposition~\ref{thm: lr}, number of iterations in Proposition~\ref{thm: convergence} and initializations in Proposition~\ref{thm:measure}) to converge to such stable points when they exists. 
While the set of stable points and approximate SPEs are not necessarily the same, we empirically show that DBI is effective in converging to SPEs.

To begin, we observe that the gradient updates in
DBI can be interpreted as a discrete dynamical system, $\bx^{t+1}=F(\bx^t)$, with $F(\bx^t)=(\bm{I}+\alpha G)(\bx^{t})$ where $G$ is an update gradient vector.
This discrete system can be viewed as an approximation of a continuous limit dynamical system
$\dot{\bx}=G(\bx)$ 
(i.e., letting $\alpha\rightarrow0$).
A standard solution concept for such dynamical systems is a \emph{locally asymptotic stable point (LASP)}.

\begin{definition}[\cite{galor2007discrete}]
A continuous (or discrete) dynamical system $\dot{\bx}=G(\bx)$ (or $\bx^{t+1}=F(\bx^t)$) has a locally asymptotic stable point (LASP) $\bx^*$ if $\exists \epsilon>0, \lim_{t\rightarrow\infty}\bx^t=\bx^*, \forall \bx^0\in\ball(\bx^*)$.
\end{definition}

There are well-known necessary and sufficient conditions for the existence of an LASP.
\begin{proposition}[Characterization of LASP ~{\citep[Theorem~1.2.5, Theorem~3.2.1]{wiggins2003introduction}}]
A point $\bx^*$ is an LASP for the continuous dynamical system $\dot{\bx}=G(\bx)$ if $G(\bx^*)=0$ and all eigenvalues of Jacobian matrix $\nabla_{\bx}G$  at $\bx^*$ have negative real parts.
Furthermore, for any $\bx^*$ such that $G(\bx^*)=0$, if $\nabla_{\bx}G$ has eigenvalues with positive real parts at $\bx^*$, then 
$\bx^*$ cannot be an LASP.
\label{prop:suff}
\end{proposition}

Note that an LASP of DBI is an action profile of all players that satisfies the first-order conditions, i.e., it has the property that no player can improve their utility through a local gradient update.
While the existence of an LASP depends on game structure, we show that under Assumption~\ref{asp:1}, and as long as the sufficient conditions for LASP existence in Proposition~\ref{prop:suff} are satisfied, DBI converges to LASP. 
We defer all the omitted proofs to the long version.\ifcr \else Appendix~\ref{sec:app-exp}.\fi
\begin{proposition}
Let $\lambda_1,\dotsc,\lambda_d$ denote the eigenvalues of the updating Jacobian $\nabla_{\bx}G$ at an LASP $\bx^*$ and define $\lambda^*=\arg\max_{i\in[d]}Re(\lambda_i)/\abs{\lambda_i}^2$, where $Re$ is the real part operator.
Then with a learning rate $\alpha<-2Re(\lambda^*)/\abs{\lambda^*}^2$, and an initial point $\bx^0\in\mathbb{B}_\epsilon(\bx^*)$ for some $\epsilon>0$ around $\bx^*$, DBI converges to an LASP. Specifically, if the choice of learning rate equals $\alpha^*$ and the modulus of matrix $\rho(\bm{I}+\alpha^*\nabla_{\bx} G)=1-\kappa<1$, then the dynamics converge to $\bx^*$ with the rate of $O((1-\kappa/2)^t)$.\label{thm: lr}
\end{proposition}

Proposition~\ref{thm: lr} states that there exists a region such that, if the initial point is in that region, then DBI will converge to an LASP. We next show that if we assume first-order Lipschitzness for the update rule, then we can also characterize the region of initial points which converge to an LASP.

\begin{proposition}
Suppose $G$ is $L$-Lipschitz.\footnote{Formally, this means that $\exists L > 0$ such that $\forall \bx, \bx^\prime \in\mathcal{X}, \|G(\bx)-G(\bx^\prime)\|_2\le L\|\bx-\bx^\prime\|_2$.} Then for all $ \bx^0\in\mathbb{B}_{\kappa/2L}(\bx^*)$, $\epsilon > 0$ and after $T$ rounds of gradient update, DBI will output a point $\bx^T\in \ball(\bx^*)$ as long as $T\ge\lceil \frac{2}{\kappa}\log\norm{\bx^0-\bx^*}/\epsilon\rceil$ where $\kappa$ is as defined in Proposition~\ref{thm: lr}.
\label{thm: convergence}
\end{proposition}

\begin{figure*}[ht!]
\centering
\subfigure[]{\includegraphics[width=0.685\columnwidth]{figures/figure.pdf}}
\subfigure[]{\includegraphics[width=0.685\columnwidth]{figures/figure2.pdf}}
\subfigure[]{\includegraphics[width=0.685\columnwidth]{figures/figure_multi.pdf}}
\caption{Convergence behaviors on (a) a $(1, 1, 1)$ game with 1-d actions (b) a $(1, 1, 2)$ game with 1-d actions (c) a $(1, 1, 1)$ game with 3-d actions. 
}
\label{fig:conv}
\end{figure*}

We further show that through random initialization, the probability of reaching a \textit{saddle point} is 0, which means that with probability 1, DBI converges to an LASP in which players are playing \emph{local} best responses.
\begin{proposition}
Suppose $G$ is $L$-Lipschitz. 
Let $\alpha<1/L$ and define the saddle points of the dynamics $G$ as $\mathcal{X}^*_{sad}=\{\bx^*\in\mathcal{X} \mid \bx^*=(\bm{I}+\alpha G)(\bx^*),\rho((\bm{I}+\alpha \nabla_{\bx}G)(\bx^*))>1\}$. Also let $\mathcal{X}^0_{sad}=\{\bx^0\in\mathcal{X} \mid \lim_{t\rightarrow\infty}(\bm{I}+\alpha G)^t(\bx^0)\in \mathcal{X}^*_{sad}\}$ denote the set of initial points that converge to a saddle point.
Then $\mu(\mathcal{X}^0_{sad})=0$, where $\mu$ is Lebesgue measure.\label{thm:measure}
\end{proposition}

While our convergence analysis does not guarantee convergence to an approximate SPE, our experiments show that DBI is in fact quite effective in doing so in practice.

%\input{exp}

\begin{figure*}[ht!]
\centering
\subfigure[]{\includegraphics[width=0.68\columnwidth]{figures/figure1_eps.pdf}}
\subfigure[]{\includegraphics[width=0.68\columnwidth]{figures/figure2_eps.pdf}}
\subfigure[]{\includegraphics[width=0.68\columnwidth]{figures/figure3_eps.pdf}}
\caption{Solution qualities on (a) a $(1, 1, 1)$ game with 1-d actions (b) a $(1, 1, 2)$ game with 1-d actions (c) a $(1, 1, 1)$ game with 3-d actions. 
}
\label{fig:local-eps}
\end{figure*}

\section{Experiments}\label{sec:exp}

In this section, we empirically investigate the following questions: (1) the convergence rate of DBI, (2) the solution quality of DBI, (3) the behavior of DBI in games where we can verify global stability.
All our code is written in python. %and attached.
We ran our experiments on an Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz to obtain the results in Sections~\ref{sec:polygames}, and on an Intel(R) Core(TM) i9-9820X CPU @ 3.30GHz for the rest of the experiments.~\footnote{Code available at \url{https://github.com/jtongxin/SHG_DBI}.}

We evaluate the performance in terms of quality of equilibrium approximation as a function of the number of iterations of a given algorithm, or its running time.
Ideally, given a collection of actions $\bm{x}$ played by players along the (approximate) equilibrium path computed, we wish to find the largest utility gain any player can have by deviating from this path, which we denote by $\epsilon(\bm{x})$.
However, this computation is impossible in our setting, as it would need to consider all possible histories as well, whereas our approach and alternatives only return $\bm{x}$ along the path of play (moreover, considering all possible histories is itself intractable).

Therefore, we consider two heuristic alternatives.
The first, which we call \emph{local SPE regret}, runs DBI for every player $i$ starting with $\bm{x}$, and returns the greatest benefit that any player can thereby obtain; we use this in Section~\ref{sec:polygames}.
In the rest of this section, we use the second alternative, which we call \emph{global SPE regret}. It considers for each player $i$ in level $l$ a discrete grid of alternative actions, and uses best response dynamics to compute an approximate SPE of the level-$(l+1)$ subgame to evaluate player $i$'s utility for each such deviation.
This approach then returns the highest regret among all players computed in this way.

Our evaluation considers three SHG scenarios.
We begin by comparing DBI to a number of baselines on simple, stylized SHG models, then
move on to three complex hierarchical game models motivated by concrete applications.
\subsection{Polynomial Games}
\label{sec:polygames}

We begin by considering instances of SHGs to which we can readily apply several state-of-the-art baselines, allowing us a direct comparison to previous work.
Specifically, we consider 3 SHG instances with different game properties: (a) a three-level chain structure (or the $(1, 1, 1)$ game) with 1-d actions (b) a ``$\Yup$" shape tree (or the $(1, 1, 2)$ game) with 1-d action spaces, and (c) and $(1, 1, 1)$ game with 3-d actions. 
In all the games, the payoffs are polynomial functions of $\bx$ with randomly generated coefficients (we can think of these as proxies for a Taylor series approximation of actual utility functions). 
The exact coefficient of these polynomial functions as well as an analysis of the running time of each method can be found in \ifcr the long version. \else Appendix~\ref{sec:app-exp}. \fi 

We compare DBI with the following five baselines: 1) simultaneous partial gradient ascent (SIM)~\citep{chasnov2020convergence,mazumdar2020gradient}, 2) symplectic gradient dynamics with or 3) without alignment (SYM\_ALN and SYM, respectively)~\citep{balduzzi2018mechanics}, 4) consensus optimization (CO) \citep{mescheder2017numerics}, and 5) Hamilton gradient (HAM)~\citep{loizou2020stochastic,abernethy2021last}.
SIM, SYM\_ALN, SYM, CO and HAM are all designed to compute a local Nash equilibrium~\cite{balduzzi2018mechanics,chasnov2020convergence}.

We start by comparing convergence behavior of DBI to the baselines.
We run all algorithm with the same initial point and learning rate. 
The results are in Figure~\ref{fig:conv} where we plot the $L_2$ norm of total gradient for each of the algorithms (Y axis)  against the number of iterations (X axis).

In all cases, DBI converges to a critical point that meets the first-order conditions while the baseline algorithms fail to do so in most cases. In Figures~\ref{fig:conv}(a) and (c),  all baselines have converged to a point with finite norm for the total gradients.
In (b), however, only CO and HAM converge to a stationary point while SIM, SYM, SYM\_ALN all diverge. For scenario (b),  DBI appears to be on an inward spiral to a critical point.
We further check the second-order condition (see \ifcr the long version\else  Appendix~\ref{sec:app-exp}\fi) and verify that DBI has actually converged to local maxima of individual payoffs in all three games. 

Next, we investigate solution quality in terms of \emph{local regret} of DBI compared to baselines.
As shown in Figure~\ref{fig:local-eps},
across all three game instances, DBI outputs a profile of actions (along the path of play) with near-zero local regret while other algorithm fail to do so. 

\subsection{Decentralized Epidemic Policy Game}\label{sec:exp3}

Next, we consider DBI for solving a class of games inspired by hierarchical decentralized policy-making in the context of epidemics such as COVID-19~\citep{jia2021game}.
The hierarchy has levels corresponding to the (single) federal government, multiple states, and county administrations under each state.
Each player's action (policy) is a scalar in $[0, 1]$ that represents, for example, the extent of social distancing recommended or mandated by a player (e.g., a state) for its administrative subordinates (e.g., counties).
Crucially, these subordinates have considerable autonomy about setting their own policies, but incur a non-compliance cost for significantly deviating from recommendations made by the level immediately above (of course, non-compliance costs are not relevant for the root player).
The full cost function of each player additionally includes an infection prevalence within the geographic territory of interest to the associated entity (e.g., within the state), as well as the socio-economic cost of the policy itself. To summarize, the total cost for each player is a combination of the infection cost, socio-economic cost as well as the non-compliance cost (when applicable). However, different players can have different combinations of these cost (through player-specific weights for each of the costs) that can lead to strategic tensions between the players (see \ifcr the long version \else Appendix~\ref{sec:app-exp}\fi for details).

\begin{figure}[ht!]
\begin{tabular}{ll}
\includegraphics[width=0.91\columnwidth]{figures/gradient.pdf}\\
\includegraphics[width=0.9\columnwidth]{figures/gradient2-new.pdf}
\end{tabular}
\caption{Global regret for the decentralized epidemic policy game.  Top and bottom columns correspond to result for games with 2 and 3 levels, respectively.}
\label{fig:eps1}
\end{figure}

Since the actions are in a one-dimensional compact space and the depth of the hierarchy is at most 3, 
our baseline is the best response dynamics (BRD) algorithm proposed by \citet{jia2021game} (detailed in \ifcr the long version\else Appendix~\ref{sec:app-exp}\fi), and we use \emph{global regret} as a measure of efficacy in comparing the proposed DBI algorithm with BRD.
The results of this comparison are shown in Figures~\ref{fig:eps1}~and~\ref{fig:eps2} for two-level (government and states) and three-level (government, states, counties) variants of this game. We consider two-level games with 20 and 50 leaves (states), and three-level games with 2 players in level 2 (states) and 4 and 10 leaves (counties).


\begin{figure}[ht!]
\begin{tabular}{ll}
\includegraphics[width=0.9\columnwidth]{figures/run_time_f3a.pdf}\\
\includegraphics[width=0.9\columnwidth]{figures/run_time_f3b-new.pdf}
\end{tabular}
\caption{Running time for the decentralized epidemic policy game.  Top and bottom columns correspond to result for games with 2 and 3 levels, respectively.}
\label{fig:eps2}
\end{figure}


As we can see in Figure~\ref{fig:eps1}, BRD can have poor convergence behavior in terms of global regret, whereas DBI appears to converge quite reliably to a path of play with a considerably lower global regret.
Notably, the improvement in solution quality becomes more substantial as we increase the game complexity either in terms of scale (number of leaves) or in terms of the level of hierarchy (moving from 2- to 3-level games).

Running time (in seconds) demonstrates the relative efficacy of DBI even further (see Figure~\ref{fig:eps2}).
In particular, observe the significant increase in the running time of BRD as we increase the number of leaves.
In contrast, DBI is far more scalable: indeed, even more than doubling the number of players appears to have little impact on its running time.
Moreover, BRD is several orders of magnitude slower than DBI for the more complex games.

\subsection{Hierarchical Public Goods Games}\label{sec:pgg}
\begin{figure}[ht!]
\centering
\includegraphics[width=0.9\columnwidth]{figures/networkedPGG_res_new.pdf}
\caption{Performance ($\epsilon$) in the Public Goods Game; the scatter points show the results of BRD with discretization factors $0.5, 0.2, 0.1, 0.05$, and best response rounds $2,3$.}\label{fig:pgg}
\end{figure}

Next, we consider \textit{hierarchical public goods games}.
A conventional networked public goods game endows each player $i$ with a utility function 
$u_i(x_i,x_{-i})=a_i + b_i x_i + \sum_{j} g_{ji} x_i x_j-c_i(x_i)$, where $g_{ji}$ is the impact of player $j$ on player $i$ (often represented as a weighted edge on a network), and $x_i \in [0,1]$ the level of investment in the public good by player $i$~\citep{bramoulle2007public}. 
We construct a 3-level hierarchical variant of such games by starting with the karate club network~\citep{zachary1977information} which represents friendships among 34 individuals.
Level-2 nodes are obtained by partitioning the network into two (sub)clubs, with leaves (level-3 nodes) representing all the individuals.
The utility of level-2 nodes is the sum of utilities of individual members of associated clubs, with the utility of the root being the sum of the utilities of all individuals.
Furthermore, we introduce non-compliance costs with investment policies in the level immediately above, as we did in the decentralized epidemic policy game (Section~\ref{sec:exp3}).
Further details on the exact form of the utility functions and parameters of the games are provided in \ifcr in the long version. \else Appendix \ref{app:pgg}.\fi

Figure \ref{fig:pgg} presents the global regret as a function of running time for DBI (black line) and BRD with different levels of discretization (dots). We observe that DBI yields considerably lower regret in these games than BRD even as we discretize the latter finely.
Moreover, DBI reaches smaller regret orders of magnitude faster than BRD.

\begin{figure*}[ht!]
\centering
\subfigure[Security game $\kappa = 0.1$.]{\includegraphics[width=1\columnwidth]{figures/SecGame0.1_res_new.pdf}\label{fig:sec_games_p1}}~
\subfigure[Security game $\kappa = 0.5$.]{\includegraphics[width= 0.8\columnwidth]{figures/SecGame0.5_res_new.pdf}\label{fig:sec_games_p5}}
\caption{Results on $(1,3,6)$ hierarchical security games. (a) $\kappa=0.1$ and (b) $\kappa = 0.5$; legend is shared. }
\end{figure*}

\subsection{Hierarchical Security Games}\label{sec:secgame}
In the final set of experiments, we evaluate DBI on a hierarchical extension of \textit{interdependent security games}~\citep{bachrach2013contagion}.
In these games, $n$ defenders can each invest $x_i \ge 0$ in security.
If defender $i$ is attacked, the probability that the attack succeeds is $1/(1+x_i)$.
Furthermore, defenders are interdependent, so that a successful attack on defender $i$ cascades to defender $j$ with probability $q_{ji}$.
In the variant we adopt, the attacker strategy is a uniform distribution over defenders (e.g., the ``attacker'' is just nature, with attacks representing stochastic exogenous failures).
The utility of the defender is the probability of surviving the attack less the cost of security investment.

We extend this simultaneous-move game to a hierarchical structure consisting of one root player (e.g., government), three level-2 players (e.g., sectors), and six leaf players (e.g., organizations).
The policy-makers in the first two levels of the game recommend an investment policy to the level below, and aim to maximize total welfare (sum of utilities) among the leaf players in their subtrees.
Just as in both hierarchical epidemic and public goods games, whenever a player in level $l$ does not act according to the recommendation of their parent in level $l-1$, they incur a non-compliance cost.
Complete model details are deferred to \ifcr the long version. \else Appendix~\ref{app:sec}. \fi
We conduct experiments with two weights $\kappa$ that determine the relative importance of non-compliance costs in the decisions of non-root players in the game: $\kappa \in \{0.1, 0.5\}$.


Figures~\ref{fig:sec_games_p1} and \ref{fig:sec_games_p5} present the results of comparing DBI with BRD on this class of games, where BRD is again evaluated with different levels of action space discretization (note, moreover, that in this setting discretizing actions is not enough, since these are unbounded, and we also had to impose an upper bound).
We can observe that for either value of $\kappa$, DBI yields high-quality SPE approximation (in terms of global SPE regret) far more quickly than BRD.
In particular, when we use relatively coarse discretization, BRD is approximately an order of magnitude slower, and yields significantly higher regret.
In contrast, if we use finer discretization for BRD, global regret for BRD and DBI becomes comparable, but now BRD is several orders of magnitude slower.
For example, DBI converges within several seconds, whereas if we discretize $x_i$ into multiples of 0.02, BRD takes nearly 2 hours, while discretization at the level of 0.01 results in BRD taking nearly 7 hours.

\section{Conclusion}\label{sec:disc}
%\input{discussion}
We introduced a novel class of hierarchical games, proposed a new game-theoretic solution concept and designed an algorithm to compute it. We assume a specific form of utility dependency between players and our solution concept only guarantees local stability. Improvement on each of these two fronts is an interesting direction for future work.

Given the generality of our framework, our approach can be used for many applications characterized by a hierarchy of strategic agents e.g., pandemic policy making. However, our modeling requires the full knowledge of the true utility functions of all players and our analysis assumes full rationality for all the players. Although the model we have addressed here is already challenging, these assumptions are unlikely to hold in many real-world applications. Therefore, further analysis is necessary to fully gauge the robustness of our approach before deployment.





\section*{Acknowledgments}
This work was supported in part by the US Army Research Office under MURI grant \# W911NF-18-1-0208.

%\bibliographystyle{plainnat}
\bibliography{ref}

\ifcr \else 
\newpage\clearpage
\onecolumn
\appendix
\section{Omitted Details from Section~\ref{sec:framework}}
\input{app-spe}\label{sec:app-spe}
\section{Omitted Details from Section~\ref{sec:analysis}}\label{sec:app-analysis}
\input{app-convergence}
\section{Omitted Details from Section~\ref{sec:exp}}\label{sec:app-exp}
\input{app-exp}
\fi

\end{document}
