\documentclass[accepted]{uai2023} 

\usepackage[american]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
\usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

%%% Load required packages here (note that many are included already).
\usepackage{float}
\usepackage{soul}
\usepackage{url}
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{subcaption}

\usetikzlibrary{calc}
\usepackage{pgfplots}
\usepackage{pgfplotstable}
\pgfplotsset{compat=1.16}

\usepackage{balance} % for balancing columns on the final page
% \usepackage{booktabs}
\setlength{\heavyrulewidth}{1.5pt}
\setlength{\abovetopsep}{4pt}
% \usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{calrsfs}
\usepackage{enumitem}
\usepackage[c3, nocomma]{optidef}
\usepackage{cuted}

\usepackage{array,longtable,tabularx,tabulary}
\newcolumntype{L}{>{\raggedright\arraybackslash}X}
\usepackage{ltablex}
\usepackage{siunitx}

\usepackage{xr}
\externaldocument{nagorko_531}

\usepackage{amsthm}
\theoremstyle{definition}
\newtheorem{example}{Example}[section]

\theoremstyle{remark}
\newtheorem{remark}{Remark}[section]
\newtheorem{proposition}{Proposition}[section]
\newtheorem{corollary}{Corollary}[section]


%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\newcommand{\subsectioninline}[1]{\noindent \textbf{#1:}}

\DeclareMathOperator{\Prob}{Prob}
\DeclareMathOperator{\Pure}{Pure}
\DeclareMathOperator*{\E}{E}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\newcommand{\C}{{\mathcal{C}}}

\title{Two-phase Attacks in Security Games (Supplementary material)}

% The standard author block has changed for UAI 2023 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1,2]{\href{mailto:<amn@mimuw.edu.pl>?Subject=Two-phase Attacks in Security Games (UAI 2023)}{Andrzej~Nagórko}{}}
\author[2]{\href{mailto:<pawel.ciosmak@ideas-ncbr.pl>?Subject=Two-phase Attacks in Security Games (UAI 2023)}{Paweł~Ciosmak}{}}
\author[2,3]{\href{mailto:<tomasz.michalak@ideas-ncbr.pl>?Subject=Two-phase Attacks in Security Games (UAI 2023)}{Tomasz Michalak}{}}
% Add affiliations after the authors
\affil[1]{%
    Department of Mathematics, University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland
}
\affil[2]{%
    Ideas NCBR, ul. Chmielna 69, 00-801 Warsaw, Poland
}
\affil[3]{%
    Department of Computer Science, University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland
}

\begin{document}
  
\onecolumn %% Turn this off if single column is desired for the supplement
\maketitle

\appendix

\section{Notational conventions}

Throughout the paper, we use variable \emph{superscripts} to denote parameters of probability distributions, e.g., $\{ y^t \}_{1 \leq t \leq n}$ denotes a family of first-move attacker's strategies, indexed by attacker type $t$.

Probability distribution dependency on other distributions is usually implicit, but if we want to state explicitly that $y^t$ is picked with the knowledge of $x$ (hence optimal $y^t$ changes with $x$), we write $y^t(x)$ with \emph{functional} notation.

We use variable \emph{subscripts} to denote values of probability distributions (i. e. we use \emph{matrix} notation), e. g. we write $x_i$ to denote probability assigned to move $i \in I$ by probability distribution $x \in \Prob(I)$.
Likewise, for player payoffs, e.g., $r_{i, t, j}$ is the defender's payoff after move $i$ was played against the attacker of type $t$ who played move $j$. 

Parametrized set families are subscripted, as there is no other use for set subscript, e.g., $\C_{t, j}$ is the set of possible payoffs of attacker of type $t$ after he played move $j$.

Often we use various combinations of variables $i, t, j, k$ as subscripts, always keeping this order in accordance with the order of how values of these variables are picked (see Section~\ref{sec:our_model}).

\section{DOBSS}

From discussion in Section~\ref{sec:preliminaries} we can derive the following quadratic programming solution to one-phase Bayesian Stackelberg games.
\begin{maxi}{x, y^t}{\sum_{i \in I} \sum_{t = 1}^n \sum_{j \in J_t} p_t x_i y^t_j r_{i, t, j}, \hspace{2cm}}{}{\label{bsg mqlp}}
\addConstraint{\sum_{i \in I} x_i = 1,}
\addConstraint{\sum_{j \in J_t} y^t_j = 1}{}{\text{ for each } 1 \leq t \leq n,}
\addConstraint{\begin{aligned}\sum_{i \in I} \sum_{j \in J_t} x_i y^t_j c_{i, t, j} \geq \quad \\
\geq \sum_{i \in I} x_i c_{i, t, j}\quad\end{aligned}}{}{\text{ for each } 1 \leq t \leq n, j \in J_t,}
\addConstraint{x \geq 0, y^t \geq 0}{}{\text{ for each } 1 \leq t \leq n.}
\end{maxi}
It is a quadratic program as it contains non-linear terms $x_i y^t_j$.
There is no linear program (LP) formulation of polynomial size, as Bayesian Stackelberg Games are known to be NP-hard~\citep{Conitzer2006Computing}.
However, there are two standard ways to deal with non-linear terms
 that we describe next as they are relevant to the solution of two-phase games studied in this paper.
%\begin{figure}[ht]
%\begin{maxi}[3]{x, y^t}{\sum_{i \in I} \sum_{t = 1}^n \sum_{j \in J_t} p_t x_i y^t_j r_{i, t, j}, \phantom{\quad\quad\quad\quad}}{}{\label{bsg miqp}}
%\addConstraint{\sum_{i \in I} x_i}{= 1,}
%\addConstraint{\sum_{j \in J_t} y^t_j = 1,}{ 1 \leq t \leq n,}
%\addConstraint{\sum_{i \in I} \sum_{j \in J_t} x_i y^t_j c_{i, t, j} \geq 
%\sum_{i \in I} x_i c_{i, t, j},
%}{1 \leq t \leq n, j \in J_t,}
%\addConstraint{x \geq 0, y^t \geq 0 \text{ for each } 1 \leq t \leq n.}
%\end{maxi}
%\end{figure}

\subsection{Harsanyi transformation}
\label{ssec:harsanyi}

If there is only one attacker type, then a linear relaxation of DOBSS (with constraints $y^t_j \in \{ 0, 1 \}$ dropped) computes an optimal strategy of the defender:
Stackelberg games with one type of attacker are solvable in polynomial time~\citep{Conitzer2006Computing}.
A Bayesian Stackelberg game can always be transformed into a Stackelberg game (a normal form) using the Harsanyi transformation at the expense of the exponential explosion of the problem size.

In the normal form, the set of moves of the single attacker is a set $J$ of sequences $(j_1, j_2, \ldots, j_n)$ with $j_t \in J_t$, $1 \leq t \leq n$.
For move $j \in J$, the defender's payoff for move $i \in I$ is $r_{i, j} = \sum_{t = 1}^n p_t r_{i, t, j_t}$ and attacker's payoff is $c_{i, j} = \sum_{t = 1}^n p_t c_{i, t, j_t}$.
In other words, the single attacker in a normal-form game selects in a single move attacks for all the attacker's types. The payoffs are the expected payoffs when the probability distribution over the types of the attacker is $\{ p_t \}$.

It turns out that the two-phase Bayesian Stackelberg games studied in this paper can be transformed into Bayesian Stackelberg games using a similar transformation. Also, here, this would result in an exponential explosion of the problem size.
We describe this in detail in Section~\ref{sec:comparison}.

The Harsanyi transformation is not an effective approach to Bayesian Stackelberg games.
DOBSS solves the problem exponentially faster, even if the entire branch-and-bound tree is explored in the solution of the mixed integer linear program~\citep{paruchuri2008playing}.
As we discuss in Section~\ref{sec:comparison}, the situation  is even worse in the case of two-phase games.

\section{Linearization of piecewise-linear problems}
\label{ssec:linearization}

Since attackers have optimal pure strategies, without a loss of generality, we may   put constraints $y^t_j \in \{ 0, 1 \}$ for each $1\leq t \leq n$, $j \in J_t$ into problem~\eqref{bsg mqlp}.
Then for non-linear terms $x_i y^t_j$, $j \in J_t$, we may introduce new variables $a^t_{i, j}$ and constraints
\begin{align*}
  0 \leq a^t_{i, j} \leq y^t_j & \text{ for each } i \in I, j \in J_t, \\
  \sum_{j \in J_t} a^t_{i, j} = x_i & \text{ for each } i \in I.
\end{align*}
Since $y^t \in \Prob(J_t)$ and $y^t_j \in \{ 0, 1 \}$, in any feasible solution we have $a^t_{i, j} = x_i y^t_j$.
We substitute $a^t_{i, j}$ for each occurrence of $x_i y^t_j$ in problem~\eqref{bsg mqlp} to get mixed integer linear program (MILP) formulation of~\eqref{bsg mqlp}.
This is the celebrated DOBSS algorithm~\citep{pita2009using}.

In the paper, we exploit the observation that similar substitutions may be performed for any piecewise-linear problem.
i.e., a problem in which a feasible set can be decomposed into a finite union of polyhedra with a property that the restriction of the objective function to each polyhedron is linear.
Such problems can be characterized to be polynomial problems in which all higher-order terms are products of an arbitrary number of binary variables and, at most, one continuous variable.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%In a multi-phase Stackelberg game at least one of the players can choose more than one action in a prescribed order. Knowledge about past actions is passed to the next players, when deciding about their moves. In this more general scenario backward propagation also gives a way to compute player's optimal strategies.

%In this work we concentrate on a case in which leader have one move and follower can perform two consecutive moves. After first move he gain information about obtained utility, form which he can infer partial information about the leader's mixed strategy. This knowledge is used by him in order to choose his second move. Follower's optimal strategy is a pair of moves, which give the highest total utility computed as a sum of utilities from both moves.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Solving two-phase games}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

In the present section we derive quadratic and mixed integer
optimization problems that compute optimal strategies in two-phase Bayesian Stackelberg games.
We start with a MIQP version and then apply linearization trick described in Section~\ref{ssec:linearization} to get a MILP formulation.

\subsection{A solution with quadratic programming} %\label{ssec:mqlp formulation}

Recall quadratic linear problem~\eqref{mqlp:formulation}. We will show that it finds the expected defender's payoff and the optimal attacker's and defender's strategies.

The objective function~\eqref{mqlp:formulation} is the expected defender's payoff $\E(R + R')$ that he wishes to maximize, from equation~\eqref{eq:leader payoff}.
Conditions~\eqref{mqlp:x probability}, \eqref{mqlp:y probability} and \eqref{mqlp:z probability} together with~\eqref{mqlp:positivness constraint} assure that
$x \in \Prob(I)$, $y^t \in \Prob(J_t)$ and $z^{t, j, c} \in \Prob(K_t)$ respectively.

We introduce variables $\gamma_{t, j, c}$ and constraints 
  that enforce that 
\[
\gamma_{t, j, c} = \max_{k \in K_t} \sum_{i \in I_{t, j, c}} x_i c'_{i, t, j, k}.
\]
From~\eqref{mqlp:second move constraint one}, we have $\gamma \geq \max$. Therefore, for each $1 \leq t \leq n$ and each $j \in J_t$, we have
\begin{align*}
\sum_{c \in C_{t, j}} \gamma_{t, j ,c} \geq \sum_{c \in \C_{t, j}} \max_{k \in K_t} \sum_{i \in I_{t, j, c}} x_i c'_{i, t, j, k} \geq \\
\sum_{c \in \C_{t, j}} \sum_{k \in K_t} \sum_{i \in I_{t, j, c}} x_i z^{t, j, c}_k c'_{i, t, j, k} = \\
\sum_{k \in K_t} \sum_{i \in I} x_i z^{t, j, c_{i, t, j}}_k c'_{i, t, j, k}.
\end{align*}
Hence condition~\eqref{mqlp: second move constraint two} guarantees that each inequality in the above chain is equality,
in particular the first inequality guarantees that for each $\gamma$ we have $\gamma \leq \max$.
It follows from Proposition~\ref{pro:second move} that
strategy $z^{t, j, c}$ is optimal if and only if
\[
  \sum_{k \in K} \sum_{i \in I_{t, j,c}} z^{t, j,c}_k x_i c'_{i,t,j,k} = \gamma_{t, j, c}.
\]
The second inequality in the above chain guarantees that it is indeed the case.

From Proposition~\ref{pro:first move}, strategy $y^t$ is optimal if and only if
\begin{align*}
\sum_{i \in I} x_i c_{i, t, j} + \sum_{c \in \C_{t, j}} \max_{k \in K_t} \sum_{i \in I_{t, j,c}} x_i c'_{i, t, j, k}
\geq \\
\sum_{i \in I} x_i c_{i, t, j} + 
  \sum_{c \in \C_{t, j}} \max_{k \in K_t} \sum_{i \in I_{t, j, c}} x_i c'_{i, t, j, k} \text{ for each } j \in J_t.
\end{align*}
This inequality is encoded as~\eqref{mqlp:first move constraint}.

\subsection{Linearization}

We used substitutions 
\begin{align*}
x_i y^t_j z^{t, j, c_{i, t, j}}_k \leftarrow w_{i, t, j, k},\\
x_i y^t_j \leftarrow \sum_{k \in K_t} w_{i, t, j, k},\\
x_i z^{t, j, c_{i, t, j}}_k,\\
y^t_j \gamma_{t, j, c} \leftarrow u_{t, j, c}
\end{align*}
for $1 \leq t \leq n, i \in I, j \in J_t, k \in K_t, c \in \C_{t, j}$.

Constraints~\eqref{milp:subsitution s 1}, \eqref{milp:z probability}, \eqref{milp:subsitution s 2} and \eqref{milp:positiveness constraint} imply that
\[
  s_{i, t, j, k} = \left\{
  \begin{array}{ll}
    x_i & \text{ if } z^{t,j, c_{i,j}}_k = 1 \\
    0 & \text{ if } z^{t,j, c_{i,j}}_k = 0,
  \end{array}
  \right.
\]
hence indeed $s_{i, t, j, k} = x_i z^{t, j, c_{i, t, j}}_k$ in any feasible solution.

Constraints~\eqref{milp:u1}, \eqref{milp:u2}, \eqref{milp:positiveness constraint} for big enough $M$ imply that
\[
  u_{t, j, c} = \left\{
  \begin{array}{ll}
    \gamma_{t, j, c} & \text{ if } y^t_j = 1 \\
    0 & \text{ if } y^t_j = 0,
  \end{array}
  \right.
\]
hence indeed $u_{t, j, c} = y^t_j \gamma_{t, j, c}$.

Finally, constraints~\eqref{milp:w1}, \eqref{milp:w2}, \eqref{milp:w3} and \eqref{milp:positiveness constraint} imply that
\[
  w_{i, t, j, k} = \left\{
  \begin{array}{ll}
    x_i & \text{ if } y^t_j = 1 \text{ and } z^{t,j, c_{i,j}}_k = 1 \\
    0 & \text{ if } y^t_j= 0 \text{ or } z^{t, j, c_{i,j}}_k = 0,
  \end{array}
  \right.
\]
hence indeed $w_{i, t, j, k} = x_i y^t_j z^{t,j,c_{i,t,j}}_k$.

This shows equivalence of the MILP formulation~\eqref{milp:formulation} and the MIQP formulation~\eqref{mqlp:formulation}.

\section{Transformation to single-phase game}

\begin{example}\label{ex:example}
For a Los Angeles airport security game with $4$ terminals and $2$ patrols  
with  payoff matrices given in Table~\ref{tab:example} (notice varying attacker payoffs)
a two-phase MILP formulation has $465$ variables (with $115$ binary variables). In the reduction to a single-phase game discussed above, the attacker has $34505$ moves. For this reduction, the MIQP formulation of DOBSS (which is much smaller than the MILP formulation) has $34513$ variables (with $34505$ binary variables).

\begin{table}[t]
\begin{tabular*}{\columnwidth}{@{\extracolsep{\fill}}@{\extracolsep{\fill}}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}r}
\toprule
 & \multicolumn{2}{c}{$T_1$} & \multicolumn{2}{c}{$T_2$} & \multicolumn{2}{c}{$T_3$} & \multicolumn{2}{c}{$T_4$} & \multicolumn{2}{c}{$\emptyset$}\\
\midrule
$T_1T_2$ & 13,& -13 & 24,& -21 & -42,& 41 & -85,& 81 & 0,& 0\\
$T_1T_3$ & 13,& -12 & -20,& 23 & 44,& -45 & -80,& 81 & 0,& 0\\
$T_1T_4$ & 15,& -15 & -22,& 20 & -45,& 42 & 85,& -85 & 0,& 0\\
$T_2T_3$ & -14,& 13 & 24,& -25 & 41,& -42 & -82,& 84 & 0,& 0\\
$T_2T_4$ & -13,& 14 & 23,& -24 & -40,& 43 & 81,& -85 & 0,& 0\\
$T_3T_4$ & -13,& 13 & -25,& 21 & 42,& -44 & 85,& -85 & 0,& 0\\
\bottomrule
\end{tabular*}
\begin{tabular*}{\columnwidth}{@{\extracolsep{\fill}}@{\extracolsep{\fill}}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}r}
\toprule
 & \multicolumn{2}{c}{$T_1$} & \multicolumn{2}{c}{$T_2$} & \multicolumn{2}{c}{$T_3$} & \multicolumn{2}{c}{$T_4$} & \multicolumn{2}{c}{$\emptyset$}\\
\midrule
$T_1T_2$ & 54,& -68 & 125,& -124 & -202,& 208 & -403,& 415 & 0,& 0\\
$T_1T_3$ & 74,& -64 & -115,& 120 & 212,& -225 & -406,& 403 & 0,& 0\\
$T_1T_4$ & 65,& -50 & -112,& 113 & -219,& 224 & 424,& -400 & 0,& 0\\
$T_2T_3$ & -72,& 64 & 108,& -123 & 225,& -207 & -418,& 403 & 0,& 0\\
$T_2T_4$ & -60,& 50 & 100,& -100 & -220,& 217 & 400,& -412 & 0,& 0\\
$T_3T_4$ & -71,& 56 & -113,& 123 & 200,& -216 & 407,& -424 & 0,& 0\\
\bottomrule
\end{tabular*}
\caption{Payoff matrices discussed in Example~\ref{ex:example}}\label{tab:example}
\end{table}
\end{example}



\bibliography{references}

\end{document}
