\documentclass[accepted]{uai2023} 

\usepackage[american]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
\usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

%%% Load required packages here (note that many are included already).
\usepackage{float}
\usepackage{soul}
\usepackage{url}
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{subcaption}

\usetikzlibrary{calc}
\usepackage{pgfplots}
\usepackage{pgfplotstable}
\pgfplotsset{compat=1.16}

\usepackage{balance} % for balancing columns on the final page
% \usepackage{booktabs}
\setlength{\heavyrulewidth}{1.5pt}
\setlength{\abovetopsep}{4pt}
% \usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{calrsfs}
\usepackage{enumitem}
\usepackage[c3, nocomma]{optidef}
\usepackage{cuted}

\usepackage{array,longtable,tabularx,tabulary}
\newcolumntype{L}{>{\raggedright\arraybackslash}X}
\usepackage{ltablex}
\usepackage{siunitx}

\usepackage{amsthm}
\theoremstyle{definition}
\newtheorem{example}{Example}[section]

\theoremstyle{remark}
\newtheorem{remark}{Remark}[section]
\newtheorem{proposition}{Proposition}[section]
\newtheorem{corollary}{Corollary}[section]


%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\newcommand{\subsectioninline}[1]{\noindent \textbf{#1:}}

\DeclareMathOperator{\Prob}{Prob}
\DeclareMathOperator{\Pure}{Pure}
\DeclareMathOperator*{\E}{E}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\newcommand{\C}{{\mathcal{C}}}

\title{Two-phase Attacks in Security Games}

% The standard author block has changed for UAI 2023 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1,2]{\href{mailto:<amn@mimuw.edu.pl>?Subject=Two-phase Attacks in Security Games (UAI 2023)}{Andrzej~Nagórko}{}}
\author[2]{\href{mailto:<pawel.ciosmak@ideas-ncbr.pl>?Subject=Two-phase Attacks in Security Games (UAI 2023)}{Paweł~Ciosmak}{}}
\author[2,3]{\href{mailto:<tomasz.michalak@ideas-ncbr.pl>?Subject=Two-phase Attacks in Security Games (UAI 2023)}{Tomasz Michalak}{}}
% Add affiliations after the authors
\affil[1]{%
    Department of Mathematics, University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland
}
\affil[2]{%
    Ideas NCBR, ul. Chmielna 69, 00-801 Warsaw, Poland
}
\affil[3]{%
    Department of Computer Science, University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland
}

\begin{document}
\maketitle

\begin{abstract}
  A standard model of a security game assumes a one-off assault during which the attacker cannot update their strategy even if new actionable insights are gained in the process.
In this paper, we propose a version of a security game that takes into account a possibility of a two-phase attack. Specifically, in the first phase, the attacker makes a preliminary move to gain extra information about this particular instance of the game. Based on this information, the attacker chooses an optimal concluding move. 
We derive a compact-form mixed-integer linear program that computes an optimal strategy of the defender. Our simulation shows that this strategy mitigates serious losses incurred to the defender by a two-phase attack while still protecting well against less sophisticated attackers.
\end{abstract}

\section{Introduction}

%First paragraph about the topic - security games are widely used and important in the literature


%First paragraph about the domain of the paper in general
In a classic economic model of a Stackelberg game~\citep{von1934marktform}, the leader chooses his strategy first, and while doing this, he is observed by the followers, who can adjust their response accordingly. In the last two decades, this model has received significant attention in the context of security applications, where a defender (the leader in the Stackelberg game) distributes limited security resources to guard a set of targets against an attacker (the follower in the Stackelberg game). For instance, Stackelberg games were applied in such domains as infrastructure security  (ARMOR~\citep{pita2009using}, IRIS~\citep{tsai2009iris}, PROTECT \citep{shieh2012protect}), green security (PAWS~\citep{yang2014adaptive}, MIDAS~\citep{haskell2014robust}), opportunistic crimes (TRUSTS~\citep{yin2012trusts}), as well as cybersecurity~\citep{zhang2021bayesian}.
In all these contexts, Stackelberg games are often called \textit{security games}.

The attack in security games is typically modeled as a one-off assault during which the attacker has no chance to update their strategy even if new valuable information is gained in the process. This, however, does not cover certain tactics that can be applied by ever more agile covert organizations. In particular, given the improvements in border control technologies that result in significant quantities of cocaine being seized in Latin America and Europe, drug cartels have to look for more innovative smuggling methods and routes. Unfortunately, according to a report by the European Monitoring Center for Drugs and Drug Addiction~\cite[p. 4]{reportEMCDDA}: ``\textit{These groups are innovative and skilled in switching and modifying both trafficking routes and} modi operandi \textit{to circumvent law enforcement activities. They are quick to identify and exploit new opportunities for cocaine trafficking (...) shift transit routes 
and storage points to capitalize on the presence of ineffective border controls.}'' To look for such new routes and access points, in the first phase of an operation, drug cartels can send ``low-profile'' couriers that carry small amounts of drugs whose key goal is to gain information. In the second phase, given the extra insight, the decision is made on which routes should be chosen for transports of much larger quantities and value. 
%The ARMOR system deployed at the Los Angeles Airport~\citep{pita2009using} used a model of security games to schedule patrols with measurable success.
This paper stems from an observation that most of the existing models are vulnerable to such two-phase attacks which may have significant security repercussions. 
%To the best of our knowledge, attacks of such a two-phase nature have not been thus far explored in the literature. 



%Our paper is motivated by an analysis of the 2021 crisis along the Polish-Belarusian border. In our analysis, we were concerned about the vulnerability ofpatrolling strategies computed with the standard (single-phase) model against”probe-then-smuggle” attacks. 
%On a higher level, the aim of our work is to draw the attention of the community to a potential vulnerability of the existing models in which it is implicitly assumed that an attacker cannot ”probe” the current strategy of the defender. 



%The first related body of works is that on  
%multi-stage Stackelberg games in which the attacker and defender interact in stages. In \citep{LUH1984251} authors analysed systems, where players choose among pure strategies. In \citep{zychowski2022coevolutionary}, an evolutionary algorithm for solving multi-stage Stackelberg games was proposed.  In \citep{guzman2022sequential}, an inspection game is formalized as a multiple-stage Stackelberg game, where mixed strategies of the defender are considered. It demonstrates similarity to our model, however, it differs with respect to the less general payoff function, as well as a single patrolling unit at the defender's disposition. Two-stage (but not two-phase) Stackelberg games were considered in the literature for example in \citep{anand2008strategic,gray2009outsourcing,kabul2019value,wang2022pollution}. 


 
%Instead, the literature focused rather on two-stage Stackelberg games that differ from the two-phase games proposed here (see Section 8).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%OUR CONTRIBUTION
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Against this background, we propose a security game that takes into account a possibility of a two-phase attack. Specifically, in the first phase, the attacker makes a preliminary move designed to gain extra information on the defender's activities in this particular instance of the game. Next, in the second phase, this insight is used to choose an optimal concluding move. Given this new model we characterize optimal strategies and expected payoffs of both the defender and the attackers.
We also derive a compact-form quadratic programming optimization problem to compute optimal strategies, with an exponential reduction in size compared to a possible reduction to a standard Bayesian Stackelberg game. We derive an effective mixed integer linearization of the quadratic formulation. Moreover we show that a strategy computed with our model mitigates serious losses of the defender from a two-phase attack while still protecting well against less sophisticated attackers. Finally we experimentally compare the time complexity of the three solutions of two-phase Bayesian Stackelberg games discussed in this paper: a mixed quadratic linear program, a mixed integer linear program and a ''normal-form'' transformation to a single-phase Bayesian Stackelberg game.

%In the standard security game model, zero knowledge of the attacker about the defender’s defensive positions is assumed. It turns out that solutions computed in this way are very fragile in this regard: as we show in Table 7 in the paper, against the standard model, if the attacker gains information about the presence/absence of a patrol in one place, he may use this knowledge to successfully perform an attack in a different place, incurring a huge loss to the defender. Table 7 together with the discussion in Section 3 shows that the solutions computed with the two-phase model are much more robust in this regard.

%\includegraphics[width=\columnwidth]{pictures/granica.png}



%The remainder of this paper is organized as follows. In the next section, we introduce the necessary background and notation. Section~\ref{sec:motivating_example} presents a motivating example of a two-phase attack. In Section~\ref{sec:our_model}, we introduce the new model and characterize optimal strategies of the defender and the attackers. In Section~\ref{sec:milp formulation} we derive the mixed quadratic and mixed integer linear programming formulations of the problem. In Section~\ref{sec:comparison}, we discuss the relation of the new model to the standard Stackelberg games. An experimental comparison of one-phase and two-phase models is done in Section~\ref{sec:experiments}. Related research is discussed in Section~\ref{sec:related:work}. Conclusions follow. 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\section{Border protection}\label{sec:border}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%Our paper is motivated by an analysis of the 2021 crisis along the Polish- Belarusian border. In our analysis, we were concerned about the vulnerability of patrolling strategies computed with the standard (single-phase) model against ”probe-then-smuggle” attacks. In the standard security game model, zero knowledge of the attacker about the defender’s defensive positions is assumed. It turns out that solutions computed in this way are very fragile in this regard: as we show in Table 7 in the paper, against the standard model, if the attacker gains information about the presence/absence of a patrol in one place, he may use this knowledge to successfully perform an attack in a different place, incurring a huge loss to the defender. Table 7 together with the discussion in Section 3 shows that the solutions computed with the two-phase model are much more robust in this regard.

%\includegraphics[width=\columnwidth]{pictures/granica.png}

%A very recent real-world example of the tactics that are explicitly modeled in our two-phase game are the actions of Lukashenko’s regime in Belarus which exploits immigrants to probe the border with Ukraine (which puts them in extreme danger due to the war). Romanenko V., ”Belarus uses migrants for intelligence on the border with Ukraine”, Ukrainska Pravda, https://www.pravda. com.ua/eng/news/2022/12/6/7379514/ (accessed Dec 9, 2022).

%On a higher level, the aim of our work is to draw the attention of the community to a potential vulnerability of the existing models in which it is implicitly assumed that an attacker cannot ”probe” the current strategy of the defender. 

% Attacks are guided by Belarus.

\section{Motivation: probing Ukrainian border by Belarus}\label{sec:motivating_example}

A recent real-world example of the tactics that are explicitly modeled in our two-phase game are the actions of Lukashenko’s regime in Belarus which exploits immigrants to probe the border with Ukraine. 
According to Special Operations Forces of the Ukraine's National Resistance Center~\citet{belarus}:
{\em ''Belarusian border guards deliberately send refugees from Iran and Pakistan to Ukrainian borders in order to search for vulnerable areas. In this way, the Belarusians check vulnerable and insufficiently protected areas of the border with Ukraine, which can be used for the passage of enemy armed forces. The enemy uses similar tactics on the border with Latvia.'' }
This callus behaviour puts the lives of the immigrants in extreme danger both due to very difficult terrain and the on-going war. In more details, Ukraine’s northwestern border of nearly 900 km is a heavily forested area full of forbidding wetlands and the Chernobyl Exclusion Zone. On top of that, the border---that was crossed by the Russian army in February 2022 and then subsequently restored by the Ukrainian counteroffensive---is now heavily fortified with trenches, walls and mine fields.

Unfortunately, despite that the border is now one of the most dangerous in the world, the Belarusian border guards organize and coordinate the groups of immigrants to attempt to cross it. The aim is to uncover and disorganise Ukrainian defences that have to react to any such attempt due to the threat from Russian saboteurs. Given the sophisticated electronic protection measures, most of such border crossing are detected. However, this does not mean that the border is impenetrable as detection does not mean that there is a patrol close enough to prevent the entry. Nevertheless, even if this particular section of the border is unmanned at the moment of entry, the Ukrainian headquarters send a team to the area.
This means that a follow-up entry attempt at the same section of the border is hardly possible.

% Similar ...

%To motivate our research, let us demonstrate shortcomings of    the standard approach to the border patrolling problem based on    a single-phase security game.
%\begin{figure}[ht]
%\includegraphics[width=\columnwidth]{pictures/att.png}
%\caption{Attacks are coordinated by forces of Lukashenko’s regime}
%\end{figure}

%\begin{figure}[ht]
%\includegraphics[width=\columnwidth]{pictures/def.png}
%\caption{Border patrol}
%\end{figure}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\section{Standard approach fails against two-phase attacks}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%Let us recall one of the problems posed by 
%  Los Angeles World Airport (LAWA) police, as described in~\citep{pita2009using}. 
%LAWA police wished to obtain an assignment of 
%  canines to patrol routes through the terminals inside Los Angeles International Airport (LAX).
%The ARMOR-canine solution deployed at LAX was based on an 
%  optimal strategy for a Bayesian Stackelberg game that we will recall below.

Let us consider a scaled-down version of the problem, with four sections of the Belarus-Ukraine border ($S_1$, $S_2$, $S_3$, and $S_4$) and two patrol units.
This setting can be modelled as a standard security game in the spirit of the one used at the Los Angeles World Airport~\citep{pita2009using}. Pure strategies (moves) of the Ukrainian defenders  are possible assignments of
  patrols to the sections of the border,
$I = \left\{ S_1S_2, S_1S_3, S_1S_4, S_2S_3, S_2S_4, S_3S_4 \right\}$.

We assume two possible types of the attacker: low- and high-profile human traffickers (type $1$ and $2$, respectively). 
The high-profile type of the attacker inflicts a much larger loss upon the defender as they organize much bigger groups. Both types have the same strategy space, i.e., an attacker of each type can either choose one of the four sections of the border or back off, i.e., $J_{1} = J_{2} = \left\{ S_1, S_2, S_3, S_4, \emptyset \right\}$. The payoffs of both parties, depending on the attacker type, increase linearly with $S_i$: for a high-profile attackers payoffs for successful attack are $50$, $100$, $150$ and $200$ respectively and for a low-profile attacker the payoffs are five times smaller. Attacker payoffs for unsuccessful attack are negative at the same scale. The defender payoffs are opposite, with small random noise added uniformly from interval $[-5, 5]$. 

Assuming that probabilities of attacks by these two types are $p_{1} = 0.8$ for the low-profile attacker and $p_{2}=0.2$ for the high-profile one, an optimal strategy for the defender is:
%, a Stackelberg equilibrium in the corresponding Bayesian Stackelberg game, is:
\begin{align*}
(x_{S_1S_2},  x_{S_1S_3}, x_{S_1S_4},& x_{S_2S_3}, x_{S_2S_4}, x_{S_3S_4}) = \\ &(0\%, 50\%, 0\%, 0\%, 50\%, 0\%).
\end{align*}
According to this strategy, border sections $S_1$ and $S_2$ are never protected simultaneously. 
%As we show in Section~\ref{sec:experiments},
Such a situation is typical for
Stackelberg equilibria in one-phase games and can be easily exploited by performing a two-phase attack. 

\subsectioninline{A two-phase attack} Let us now assume that, unknown to the defender, the attacker has the resources and the capabilities of both the low-profile human trafficker and the high-profile one, and they are able to try two sections of the border sequentially, in phases. Given the optimal strategy derived above, let us assume that, in the first phase, a low-profile human trafficker tries to breach the border at section $S_1$. This provides valuable information to the attacker, irrespective of how the defender is positioned. This is because the attacker knows now a conditional probability distribution of defender's resources.

In our computation we assumed that the attacker could not attack the same target twice (which was modeled by setting second-phase payoffs for repeating the same attack to minus infinity). This was  motivated by the border-patrolling scenario: a small-scale attack (provocation) elicits border patrol’s response; the information gained by the attacker is the response time (they learn whether patrol was close by or not) and they could not attack safely at the same place again.

Let $t \in \{ 0\%, 17\%, 33\%, 50\%, 67\%, 83\%, 100\% \}$ be a chance of encountering a two-phase attacker, $(1-t) \cdot 80\%$ be a probability of encountering a low-profile single-phase attacker and $(1-t) \cdot 20\%$ be a likelihood of encountering a high-profile single-phase attacker.
For $t=0\%$ this is the standard one-phase model, while $t=100\%$ describes a pure two-phase attack.

Figure~\ref{fig:strategy} shows that presence of two-phase attackers significantly alters the Stackelberg equilibrium of the game.
For example, for $33\%$ probability of a two-phase attack (with $53\%$ chance of a single-phase low-profile attack and $13\%$ chance of a single-phase high-profile attack, keeping the $4 : 1$ low- to high-profile ratio), the optimal defender strategy becomes
\begin{align*}
(x_{S_1S_2},  x_{S_1S_3}, x_{S_1S_4},& x_{S_2S_3}, x_{S_2S_4}, x_{S_3S_4}) = \\ &(12\%, 15\%, 17\%, 17\%, 18\%, 21\%).
\end{align*}
As we see in Figure~\ref{fig:strategy}, two-phase Stackelberg equilibria are much more robust against changes of attacker profiles.

\begin{figure}[t]
\centering
\input{figures/strategy6.pgf}
\caption{Each row presents an optimal mixed strategy of the defender against
a group of attackers with a given chance of encountering a two-phase attack.
As we can see in the last row, without presence of two-phase attackers the Stackelberg equilibrium heavily over-fits to the random noise in payoff matrices.}
\label{fig:strategy}
\end{figure}


\begin{figure}[t]
\centering
\input{figures/strategy_vs_composition6.pgf}
\caption{Expected defender payoff when playing a strategy from Figure~\ref{fig:strategy} against a given chance of a two-phase attack.
As we can see in the last column, the loss incurred by playing a strategy that ignores the possibility of a two-phase attack is an order of magnitude larger than over-cautious protection against such attacks.
}
\label{fig:payoffs}
\end{figure}
Figure~\ref{fig:payoffs} shows how defender payoffs change against different compositions of attacker groups. For example, the expected payoff of the defender $\E(R) = {\bf 0.7}$ against a
  single-phase attack drops to ${\bf -175}$ when single-phase strategy is pitted against a two-phase attacker.
  
In order to fix this flaw, we propose a new model which allows for considering one-phase and two-phase attackers simultaneously.
  With our security model, the expected payoff against coordinated attackers jumps from $\textbf{-175}$ to $\textbf{-16.2}$ (the defender is still at a disadvantage). The optimal strategy:
\begin{align*}
(x_{S_1S_2}, x_{S_1S_3}, x_{S_1S_4}, x_{S_2S_3},  x_{S_2S_4}, x_{S_3S_4}) &=\\
(8.5\%, 11\%, 12\%, 20\%, 25\%, 23\%)&
\end{align*}
forces the low-profile attacker to attack $S_1$ and the high-profile attacker to back off if $S_1$ was not patrolled.
Note that this comes at a cost: for the uncoordinated (one-phase) attack, when low- and high-profile attackers act independently, this strategy brings payoff $\textbf{-7.89}$ to the defender (a drop from $\textbf{0.7}$).

%\begin{figure*}
%    \begin{center}
%        \begin{istgame}
%            \xtShowEndPoints % solid nodes
%            \xtdistance{20mm}{55mm}
%            \istrooto(0)(0, 0)[box node, densely dotted, rounded corners=0.8em, inner sep=0.5em, fill=gray!10]{Nature's Choice of Adversary}
%            \istb{L, \frac 15}[fill=white]
%            \istb[very thick]{H, \frac 45}[fill=white]
%            \endist
%            \xtdistance{20mm}{27mm}
%            \istrooto(L)(0-1){}
%            \istb{D_1, x_1}[fill=white]
%            \istb{D_2, x_2}[fill=white]
%            \endist
%            \istrooto(H)(0-2){}
%            \istb{D_1, x_1}[fill=white]{}
%            \istb[very thick]{D_2, x_2}[fill=white]{}
%            \endist
%            \xtInfosetO[fill=gray!10](L)(H){Defender}(2em)
%            \xtdistance{25mm}{10mm}
%            \istrooto(LT_1)(L-1){}
%            \istb{A_1, y^L_1}[fill=white, near end]{+}
%            \istb{A_2, y^L_2}[fill=white]{-}
%            \istb{\emptyset, y^L_3}[fill=white, near end]{0}
%            \endist
%            \istrooto(LT_2)(L-2){}
%            \istb{A_1, y^L_1}[fill=white, near end]{-}
%            \istb{A_2, y^L_2}[fill=white]{+}
%            \istb{\emptyset, y^L_3}[fill=white, near end]{0}
%            \endist
%            \xtInfosetO[fill=gray!10](LT_1)(LT_2){Attacker L}(2em)
%            % Attacker H
%            \istrooto(HT_1)(H-1){}
%            \istb{A_1, y^H_1}[fill=white, near end]{++}
%            \istb{A_2, y^H_2}[fill=white]{--}
%            \istb{\emptyset, y^H_3}[fill=white, near end]{0}
%            \endist
%            \istrooto(HT_2)(H-2){}
%            \istb{A_1, y^H_1}[fill=white, near end]{--}
%            \istb[very thick]{A_2, y^H_2}[fill=white]{++}
%            \istb{\emptyset, y^H_3}[fill=white, near end]{0}
%            \endist
%            \xtInfosetO[fill=gray!10](HT_1)(HT_2){Attacker H}(2em)
%        \end{istgame}
%        \caption{A Bayesian Stackelberg security game in extensive form, with low-profile (L) and high-profile (H) attackers.}
%    \end{center}
%\end{figure*}

\section{Preliminaries}\label{sec:preliminaries}
In the Bayesian Stackelberg game, the defender plays against
  a group of attackers of $n$ distinct types.
In each round, the defender plays against a single attacker and
encounters the attacker of type~$1 \leq t \leq n$ randomly, with probability $p_t$.
Attackers may have different sets of moves at their disposal
  that inflict different damage to the defender.

Let $I$ denote the set of defender's moves.
In the Bayesian Stackelberg game, the defender picks his mixed strategy $x$ first.
Here $x = \{ x_i \}_{i \in I}$ is a probability measure on $I$, which we denote by $x \in \Prob(I)$ with
$\Prob(I) = \left\{ x \colon I \to \mathbb{R} \colon \sum_{i \in I} x_i = 1, x \geq 0 \right\}$.
Strategy $x$ does not depend on $t$ as the defender doesn't know the type of attacker he will encounter.
Let $J_t$ denote the set of moves of attacker of type~$t$.
Attacker~$t$ picks his strategy $y^t = y^t(x)\in \Prob(J_t)$
  second, with the knowledge of the defender's strategy $x$.
%
In each round of the game, both players move independently, according to strategies $x$ and $y^t(x)$ they picked prior.
Let $r_{i, t, j}$ denote the defender's payoff if he played move $i \in I$ against the attacker of type $1 \leq t \leq n$ who played a move $j \in J_t$.
Let $c_{i, t, j}$ denote attacker's payoff (which may be different from $-r_{i, t, j}$).
%
Attacker $t$ picks an optimal strategy $\overline{y}^t = \overline{y}^t(x)$ that depends on strategy $x$ known by him and that
maximizes his expected payoff $\overline{c} = \sum_{i \in I} \sum_{j \in J_t} x_i \overline{y}^t_j c_{i, t, j}$.
This payoff is maximized by a pure strategy, i.e., $\overline{y}^t$ is optimal if and only if $\overline{c} \geq \sum_{i \in I} x_i c_{i, t, j}$ for each $j \in J_t$.
%
The defender acts to maximize his expected payoff against the optimal strategies of the attackers,
i.e. he picks an optimal strategy $\overline{x}$ 
that maximizes his expected payoff $\sum_{i \in I} \sum_{t = 1}^n \sum_{j \in J_t} p_t x_i \overline{y}^t_j r_{i, t, j}$.

These observations coupled with a linearization technique lead to a
  mixed integer linear programming formulation of Bayesian Stackelberg games published in~\citep{paruchuri2008playing} as the celebrated DOBSS algorithm.


%
% Payoff normalization subsection
%

% \input{AAMAS-2023 Formatting Instructions/normalization_standard.tex}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Our model}\label{sec:our_model}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Let us now describe our model of a two-phase security game, which is an extension of the model of Bayesian Stackelberg games specified above in Section~\ref{sec:preliminaries}. 

In a {\bf two-phase Bayesian Stackelberg game} the defender picks his mixed strategy $x \in \Prob{I}$, where $I$ denotes the set of possible defender's moves. Then the attacker of type $t$ (encountered with probability $p_t$) picks his first-phase mixed strategy $y^t(x) \in \Prob{J_t}$ with the knowledge of defender's strategy $x$, where $J_t$ denotes the set of possible first-phase moves of attacker of type~$t$.
After both the defender and the attacker make their moves $i \in I$ and $j \in J_t$ independently according to probability distributions $x$ and $y^t(x)$ the attacker learns his first-phase payoff $c_{i, t, j}$. 
This narrows a possible range of moves that the defender played.
With this information the attacker picks his second-phase mixed strategy $z^{t, j, c_{i, t, j}}(x) \in \Prob(K_t)$, where $K_t$ denotes the set of possible second-phase moves of attacker of type $t$ and makes his second-phase move $k \in K_t$ according to this probability distribution.
 The outcome of the game for the defender is $r_{i, t, j} + r'_{i, t, j, k}$,
  where $r$ denotes the first-phase defender's payoff and $r'$ denotes the second-phase one. The outcome for the attacker is $c_{i, t, j} + c'_{i, t, j, k}$, where $c'$ is the second phase payoff.


In the above scenario, we assume that the attacker is much more agile than the defender, who picked his move (e.g., patrolling routes) for a period of time.
Still, the defender wishes to maximize his expected payoff $\E(r + r')$ even if the attacker can gain partial information about the defender's position $i$ with a small-scale attack.

\subsection{Expected payoffs}\label{ssec:expected payoffs}

The set of all possible play-outs in a two-phase game is
\[
\Omega = 
\left\{ (i, t, j, k) \colon i \in I, 1 \leq t \leq n, j \in J_t, k \in K_t \right\}.
\]
Let us introduce following random variables on $\Omega$: $X$ - the defender's move; $T$ - the attacker's type; $Y$ - the attacker's first move; $Z$ - the attacker's second move; $C$ - the attacker's first-phase payoff; $C'$ - the attacker's second-phase payoff; $R$ - the defender's first-phase payoff; $R'$ - the defender's second-phase payoff. Note that variables are defined on $\Omega$ so, for example, $R$ is evaluated on $(i, t, j, k)$ but it is equal to $r_{i, t, j}$ and is independent of $k$.
%\begin{itemize}[label=-]
%    \item $X(i, t, j, k) = i$ denotes defender's move;
%    \item $T(i, t, j, k) = t$ denotes attacker type;
%    \item $Y(i, t, j, k) = j$ denotes attacker's first move;
%    \item $Z(i, t, j, k) = k$ denotes attacker's second move;
%    \item $C(i, t, j, k) = c_{i, t, j}$ denotes attacker's first phase payoff;
%    \item $C'(i, t, j, k) = c'_{i, t, j, k}$ denotes attacker's second phase payoff;
%    \item $R(i, t, j, k) = r_{i, t, j}$ denotes defender's first phase payoff;
%    \item $R'(i, t, j, k) = r'_{i, t, j, k}$ denotes defender's second phase payoff.
%\end{itemize}
%
We have
\begin{align*}
P(X=i) = x_i, 
P(T=t) = p_t, \\
P(Y=j|T=t)=y^t_j(x),\\
P(Z=k|T=t, Y=j, C=c)=z^{t,j,c}_k(x, y).
\end{align*}

The functional dependency $y^t(x)$ of $y^t$ on $x$ means that $y$ is picked with the knowledge of strategy $x$. Similarly for dependency $z^{t, j, c}(x, y)$ of $z^{t, j, c}$ on $x$ and $y$.
From now on, for simplicity, we will write $y^t$ and $z^{t, j, c}$. 
%Moreover let $\pi_{i,t,j,k} = x_i p_t y_j z^{j, c_{i,j}}_k$.

% The joint distribution of $X, T, Y, Z$ on $\Omega$ is
% \begin{align}
% \label{eq:joint distribution}
% \begin{split}
% P(X = i, T = t, Y = j, Z = k) = 
% P(X = i) P(T = t) \cdot \\ \cdot P(Y = j | T = t) P(Z = k | T = t, Y = j, C = c) = 
% x_i p_t y^t_j z^{t, j, c}_k.
% \end{split}
% \end{align}
Using this notation, we can write the expected payoff of the defender:
\begin{align}\label{eq:leader payoff}
  & \E(R + R') = \\ 
  & \sum_{(i, t, j, k) \in \Omega} x_i p_t y^t_j z^{j, c_{i,t,j}}_k \left(R(i,t,j,k) + R'(i,t,j,k)\right), \notag
\end{align}
as well as the expected payoff of the attacker:
\begin{align}\label{eq:follower payoff}
  & \E(C + C') = \\
  & \sum_{(i, t, j, k) \in \Omega} x_i p_t y^t_j z^{j, c_{i,t,j}}_k \left(C(i,t,j,k) + C'(i,t,j,k)\right). \notag
\end{align}

Let $\C_{t, j} = \{ c_{i,t,j} \colon i \in I \}$.
Given the defender's strategy $x$, the attacker's of type $t$ best response maximizes his payoff:
\begin{align*}
  & (\overline{y}^t, \overline{z}^{t, j, c} \colon j \in J_t, c \in \C_{t, j}) \in \\
  & \quad\quad\quad\argmax_{y^t \in \Prob(J_t), z^{t, j, c} \in \Prob(K_t)} \left\{ \E(C + C'|T=t) \right\}.
\end{align*}
Note that there is a single first-phase strategy $y^t$ for attacker of type $t$ and multiple second-phase strategies $z^{t,j,c}$ that depend on first-phase move $j \in J_t$ and first-phase reward $c \in \C_{t, j}$ obtained by the attacker.
Assuming perfect rationality of the attacker, the defender adjusts his strategy to maximize his own payoff:
 \begin{align*}
  \overline{x} \in \argmax_{x \in \Prob(I)} \left\{ 
    \sum_{t=1}^n p_t \E(R + R'|T=t) \colon  
    (y^t, z^{t, j, c}) \in \right. \\ \left. \argmax_{y, z} \left\{ \E(C + C'|T=t) \right\}
  \right\}.
  \end{align*}

%In the next section, we will derive the criteria under which the strategies of the defender and attackers are optimal.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
\subsection{Optimal strategies}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  
Let $I_{t, j, c} = \{ i \in I \colon c_{i, t, j} = c \}$ and $\C_{t, j} = \{ c_{i,t,j} \colon i \in I \}$.

\begin{proposition}\label{pro:second move}
  Assume that attacker of type $1 \leq t \leq n$ played a
  first-phase move $j \in J_t$ 
  against defender's strategy $x \in \Prob(I)$ and learned his first-phase
  payoff $c \in \C_{t, j}$.  
  His second move strategy $z^{t, j, c}$ is optimal if and only if it maximizes
  \begin{align*}
    \E(C' | T=t, Y = j, C = c) = \\ 
    = \frac 1{\sum_{i \in I_{t, j, c}} x_i} \sum_{k \in K_t} z^{t, j,c}_k & \cdot \left( \sum_{i \in I_{t, j,c}} x_i c'_{i,t,j,k}\right).
  \end{align*}
  Hence any strategy $z^{t, j,c}$ that 
  distributes probability among moves $k \in K$ with maximal $\sum_{i \in I_{t, j,c}} x_i c'_{i,t,j,k}$ is optimal.
  There always exists an optimal \emph{pure} strategy $z^{t, j, c}$, i. e. without
  a loss of generality, we may assume that an optimal attacker's strategy 
  satisfies $z^{t,j,c}_k \in \{ 0, 1 \}$ for each $k \in K_t$.
\end{proposition}
\begin{proof}
%From~\eqref{eq:joint distribution}, 

We have
\begin{align*}
P(C = c | T = t, Y = j) = \sum_{i \in I_{t, j, c}} x_i, \\
P(X = i, Z = k, C = c | T = t, Y = j) \\ =
\left\{
\begin{array}{ll}
  x_i z^{t, j, c}_k & \text{ if } c_{i, t, j} = c \\
  0 & \text{ otherwise }.
\end{array}
\right.
\end{align*}
Therefore we have 
\begin{align*}
& E(C' | T = t, Y = j, C = c) = \\
&\sum_{i \in I} \sum_{k \in K_t} 
c'_{i, t, j, k} P(X = i, Z = k | T = t, Y = j, C = c) = \\
&\sum_{k \in K_t} \frac{\sum_{i \in I} c'_{i, t, j, k}  P(X = i, Z = k, C =c | T = t, Y = j)}{P(C = c | T = t, Y = j)} \\
%&\sum_{k \in K_t}  \frac{\sum_{i \in I_{i, t, j}} c'_{i, t, j, k}
%  x_i z^{t, j, c}_k}{
%\sum_{i \in I_{t,j,c}} x_i} = \\
& = \frac 1 {\sum_{i \in I_{t,j,c}} x_i} \sum_{k \in K_t} \sum_{i \in I_{t, j, c}} c'_{i, t, j, k} x_i z^{t,j,c}_k,
\end{align*}
as claimed.
\end{proof}

\begin{corollary}\label{cor:second move}
  Assume that attacker of type $1 \leq t \leq n$ played a fist-phase move $j \in J_t$ 
  against defender's strategy $x \in \Prob(I)$.  
  The expected defender's second-phase payoff against an optimal attacker's strategy is
\[
\E(R' | T = t, Y = j) = \sum_{c \in \C_{t, j}} \sum_{i \in I_{t, j, c}} x_i r'_{i, t, j, k_{t, j, c}},
\]
  where
\[
k_{t, j, c} \in \argmax_{k \in K_t} \sum_{i \in I_{t, j, c}} x_i c'_{i, t, j, k}.
\] 
\end{corollary}


\begin{proposition}\label{pro:first move}
  Assume that in the first-phase attacker of type $1 \leq t \leq n$ plays against the defender's strategy $x \in \Prob(I)$.
  His first move strategy $y^t$ is optimal if and only if it maximizes
  \begin{align*}
  E(C + C'|T = t) = \\
\sum_{j \in J_t} y^t_j \left(
\sum_{i \in I} x_i c_{i, t, j} + 
\sum_{c \in \C_{t, j}} \max_{k \in K_t} \sum_{i \in I_{t, j,c}} x_i c'_{i, t, j, k}
\right).
  \end{align*}
Hence any strategy $y^t$ that distributes probability among moves $j \in J_t$ with maximal 
  \[ \sum_{i \in I} x_i c_{i, t, j} + 
  \sum_{c \in \C_{t, j}} \max_{k \in K_t} \sum_{i \in I_{t, j, c}} x_i c'_{i, t, j, k}\] is optimal.
  There always exists a pure optimal strategy $y^t$,
  i.e., without a loss of generality, we may assume that an
  optimal attacker's first-phase strategy satisfies $y^t_j \in \{ 0, 1 \}$ for each $j \in J_t$.
\end{proposition}
\begin{proof} 
%Directly from the definition and from~\eqref{eq:joint distribution}, 

We have
\begin{align*}
\E(C | T = t) = \sum_{i \in I} \sum_{j \in J_t} x_i y^t_j c_{i, t, j}.
\end{align*}
On the other hand
\begin{align*}
  \E(C' | T = t) = \sum_{j \in J_t}  \sum_{i \in I} y^t_j x_i \E(C' | T = t, Y = j, X = i) = \\
  = \sum_{j \in J_t} y^t_j \left( \sum_{i \in I} x_i \left( \sum_{k \in K_t} z^{t, j, c_{i, t, j}}_k c'_{i, t, j, k} \right) \right)
  = \\
  = \sum_{j \in J_t} y^t_j \left( 
  \sum_{c \in \C_{t, j}} \sum_{i \in I_{t, j, c}} \sum_{k \in K_t} z^{t, j, c}_k x_i c'_{i, t, j, k}
  \right) = \\
  = \sum_{j \in J_t} y^t_j \left( 
  \sum_{c \in \C_{t, j}} \sum_{k \in K_t} z^{t, j, c}_k \left( \sum_{i \in I_{t, j, c}} x_i c'_{i, t, j, k} \right)
  \right).
\end{align*}

Since $\E(C | T = t)$ does not depend on $z^{t,j,c}$, optimal strategies $z^{t, j, c}$ should be chosen as to maximize $E(C' | T = t)$. Since each value $z^{t,j,c}_k$ appears exactly once in the formula, it is enough if we set $z^{t,j,c}_k = 1$ next to the largest coefficient for each $j$ and $c$.
Hence with optimal attacker's response, we have
\[
\E(C' | T = t) = \sum_{j \in J_t} y^t_j \left( \sum_{c \in \C_{t, j}} \max_{k \in K_t} \sum_{i \in I_{t, j, c}} x_i c'_{i, t, j, k} \right)
\]
We are done, since $\E(C + C'| T = t) = \E(C | T = t) + \E(C' | T = t)$.
\end{proof}

\begin{corollary}\label{cor:first move}
  Assume that the defender's strategy $x \in \Prob(I)$ is played against an attacker of type $1 \leq t \leq n$. Then the expected defender's payoff against an optimal attacker's strategy is
\[
E(R + R' | T = t) = \left(\sum_{i \in I} x_i r_{i, t, j_t}\right)
  + \E(R' | T = t, Y = j_t),
\]
where
\[
j_t \in \argmax_{j \in J_t} \left( \sum_{i \in I} x_i c_{i, t, j} + \sum_{c \in \C_{t, j}} \max_{k \in K_t} \sum_{i \in I_{t, j, c}} x_i c'_{i, t, j, k} \right)
\]
and $E(R' | T = t, Y=j_t)$ is the payoff computed in the
statement of Corollary~\ref{cor:second move}.
\end{corollary}
\begin{remark}\label{rem:first move}
The defender's payoffs computed in Corollaries~\ref{cor:first move} and~\ref{cor:second move} depend on the choices of $j_t$'s and $k_{t, j, c}$, respectively. If a multiple choices are possible, we assume, following the existing literature, a choice that maximizes the defender's payoff.
\end{remark}

\begin{figure*}
\centering
\begin{maxi!}[3]<b>{
x_i, y^t_j, z^{t,j,c}_k, \gamma_{t,j,c}, 
s_{i,t,j,k}, u_{t,j,c}, w_{i,t,j,k}
}{
\sum_{\substack{1 \leq t \leq n, i \in I,\\ j \in J_t, k \in K_t}} p_t \left( r_{i,t,j} + r'_{i,t,j,k} \right) w_{i,t,j,k}\hspace{4cm}
}{}{\label{milp:formulation}}
\addConstraint{\sum_{i \in I} x_i = 1,}{ }\label{milp:x probability}
\addConstraint{\sum_{j \in J_t} y^t_j = 1}{}{\text{ for each } 1 \leq t \leq n,}\label{milp:y probability}
\addConstraint{\sum_{k \in K_t} z^{t, j,c}_k = 1}{}{\text{ for each } 1 \leq t \leq n, j \in J_t, c \in \C_{t, j},}\label{milp:z probability}
\addConstraint{\gamma_{t, j, c} \geq \sum_{i \in I_{t, j, c}} x_i c'_{i, t, j, k}}{}{ \begin{aligned}\text{ for each } 1 \leq t \leq n, j \in J_t, \\ c \in \C_{t, j}, k \in K_t,\end{aligned}}\label{milp:second move constraint one}
\addConstraint{\sum_{k \in K_t} \sum_{i \in I} s_{i, t, j, k} c'_{i,t,j,k} \geq \sum_{c \in \C_{t, j}} \gamma_{t, j,c}}{}{\text{ for each } 1 \leq t \leq n, j \in J_t,}\label{milp: second move constraint two}
\addConstraint{
\begin{aligned}
\sum_{m \in J_t} \left( \left( \sum_{i \in I} \left( \sum_{k \in K_t} w_{i,t,m,k} \right) c_{i, t, m}\right) + \sum_{c \in \C_{t, m}} u_{t, m, c} \right) \\ \geq \sum_{i \in I} x_i c_{i, t, j} + \sum_{c \in \C_{t, j}} \gamma_{t, j, c}
\end{aligned}
}{}{\text{ for each } 1 \leq t \leq n, j \in J_t,
}\label{milp:first move constraint}
% substitution constraints
%
% variable s
%
\addConstraint{ s_{i, t, j, k} \leq z^{t, j, c_{i,t,j}}_k}{}{\text{ for each } i \in I, 1 \leq t \leq n, j \in J_t, k \in K_t,}\label{milp:subsitution s 2}
\addConstraint{ \sum_{k \in K_t} s_{i, t, j, k} = x_i}{}{\text{ for each } i \in I, 1 \leq t \leq n, j \in J_t,}\label{milp:subsitution s 1}
%
% variable u
%
\addConstraint{
-M(1 - y^t_j) \leq u_{t, j, c} - \gamma_{t, j, c} \leq M(1 - y^t_j)}{}{\text{ for each } 1 \leq t \leq n,  j \in J_t, c \in \C_{t, j},}\label{milp:u1}
\addConstraint{-M y^t_j \leq u_{t, j, c} \leq M y^t_j}{}{\text{ for each } 1 \leq t \leq n, j \in J_t, c \in \C_{t, j},}\label{milp:u2}
%
% variable w
%
\addConstraint{ \sum_{j \in J_t} \sum_{k \in K_t} w_{i, t, j, k} = x_i}{}{\text{ for each } i \in I, 1 \leq t\leq n,}\label{milp:w1}
\addConstraint{ w_{i, t, j, k} \leq y^t_j}{}{\text{ for each } i \in I, 1 \leq t \leq n, j \in J_t, k \in K_t,}\label{milp:w2}
\addConstraint{w_{i, t, j, k} \leq z^{t, j, c_{i,t,j}}_k}{}{\text{ for each } i \in I, 1 \leq t \leq n, j \in J_t, k \in K_t,}\label{milp:w3}
%
\addConstraint{x_i, s_{i, t, j, k}, w_{i, t, j, k} \geq 0, \gamma_{t,j,c}, u_{t,j,c} \in \mathbb{R}, y^t_j, z^{t,j,c}_k \in \{ 0, 1 \}.}\label{milp:positiveness constraint}
\end{maxi!}
\caption{A mixed-integer linear programming formulation of two-phase security games.}
\label{fig:milp formulation}
\end{figure*}

\subsection{Solving two-phase games}\label{ssec:milp formulation}

Using Proposition~\ref{pro:first move} and Proposition~\ref{pro:second move} we can derive the following quadratic programming solution of two-phase security games.

\begin{maxi!}<b>{x_i, y^t_j, z^{t, j, c}_k, \gamma_{t, j, c}}{\sum_{t = 1}^n \sum_{i \in I} \sum_{j \in J_t} p_t x_i y^t_j \times }{}{}  \nonumber\breakObjective{\left( r_{i,t,j} + \sum_{k \in K_t} z^{t, j, c_{i,t,j}}_k r'_{i,t,j,k} \right)}\label{mqlp:formulation}
\addConstraint{\sum_{i \in I} x_i = 1,}\label{mqlp:x probability}
\addConstraint{\sum_{j \in J_t} y^t_j = 1 \text{ for each } 1 \leq t \leq n,\label{mqlp:y probability}}
\addConstraint{\sum_{k \in K_t} z^{t, j,c}_k = 1 \text{ for each } 1 \leq t \leq n, j \in J_t, c \in \C_{t, j},}\label{mqlp:z probability}
\addConstraint{\begin{aligned} \gamma_{t, j, c} & \geq \sum_{i \in I_{t, j, c}} x_i c_{i, t, j, k}' \\ 
& \text{ for each } 1 \leq t \leq n, j \in J_t, c \in \C_{t, j}, k \in K_t,\end{aligned}}\label{mqlp:second move constraint one}
\addConstraint{\begin{aligned}
\sum_{k \in K_t} \sum_{i \in I} z^{t, j, c_{i, t, j}}_k x_i c'_{i,t,j,k} \geq \sum_{c \in \C_{t, j}} \gamma_{t, j,c} \\
\text{ for each } 1 \leq t \leq n, j \in J_t,
\end{aligned}}{\label{mqlp: second move constraint two}}
\addConstraint{\begin{aligned}\sum_{j \in J_t} y^t_j \left(\sum_{i \in I} x_i c_{i, t, j} + \sum_{c \in \C_{t, j}} \gamma_{t, j, c} \right) \geq   \sum_{i \in I} x_i c_{i, t, j} \\ + \sum_{c \in \C_{t, j}} \gamma_{t, j, c}  \text{ for each } 1 \leq t \leq n, j \in J_t, \end{aligned}}\label{mqlp:first move constraint}
\addConstraint{x_i, y^t_j, z^{t,j,c}_k \geq 0, \gamma_{t,j,c} \in \mathbb{R}.}\label{mqlp:positivness constraint}
\end{maxi!}

Using a linearization technique we derive a mixed integer linear programming solution~\eqref{milp:formulation} listed in Figure~\ref{fig:milp formulation}.
The details of the derivation of quadratic programming formulation~\eqref{mqlp:formulation} and mixed integer linear programming formulation~\eqref{milp:formulation} are published in the appendix.
Note that a two-phase security game may be expressed in extensive form and technique from~\citep{bosansky2015sequence} may be used to derive a mixed-integer linear programming formulation of similar size.

\section{Comparison to the standard Bayesian Stackelberg games}
\label{sec:comparison}

In this section, we show that it is possible to reduce a two-phase Bayesian Stackelberg game to a one-phase Bayesian Stackelberg game using a transformation
that is similar to a Harsanyi normal-form transformation.
However, this reduction results in an exponential explosion of the problem size. Using equations~\eqref{eq:follower payoff} and~\eqref{eq:leader payoff} and the observation that the attackers have optimal \emph{pure} strategies, we can write the following mixed quadratic linear problem that solves two-phase Bayesian Stackelberg games.
\begin{maxi!}{}{\E(R + R')}{}{}\label{mqlp:transformed}
\addConstraint{x \in \Prob(I), y^t \in \Prob(J_t), z^{t,j,c} \in \Prob(K_t)}
\addConstraint{
\begin{aligned}
  \E(C & + C') \geq \sum_{i \in I} x_i c_{i, t, j} + \sum_{i \in I} x_i c'_{i, t, j, k_{j,c_{i,j}}} \\
& \text{ for each } 1 \leq t \leq n, j \in J_t, \{ k_{j, c} \}_{c \in \C_{t, j}} \subset K_t
\end{aligned}
}\label{mqlp:transformed optimality}
\end{maxi!}
%Condition~\eqref{mqlp:transformed optimality} ensures that the
%  attacker's payoff is not worse than the best payoff he can get using pure strategies $y^t$ and $z^{t, j, c}$.
Note the possibly exponential number of conditions of type~\eqref{mqlp:transformed optimality}, as we have to consider all possible combinations of first-phase move $j$ and second-phase moves $k_c$ depending on the first-phase outcome~$c$.

Note that
%\[
%\sum_{i \in I} x_i c_{i, t, j} + \sum_{i \in I} x_i c'_{i, t, j, k_{j,c_{i,j}}} = \sum_{i \in I} x_i \left( c_{i,t,j} + c'_{i, t, j, k_{j, c_{i,j}}} \right)
%\]
%is attacker's $t$ payoff in a regular (single-phase) Bayesian Stackelberg game
% where 
the set of attacker moves in a regular (single-phase) Bayesian Stackelberg game is:
$
  J'_t = \bigcup_{j \in J_t} \{ j \} \times K_t^{\C_{t, j}},
$
where $K_t^{\C_{t, j}}$ is a set of all maps from $\C_{t, j}$ to $K_t$, i. e. a choice of a move $k_c$ from $K_t$ for each possible first-phase outcome $c \in \C_{t, j}$.
%This is the transformed problem, where attacker of type $t$ selects his first-phase and second-phase moves at once, before he learns his first-phase payoff. Hence he has to pick up-front a response move to each possible first-phase outcome. 
%In the transformed problem the defender's payoff matrix against attacker $t$ is:
%$$
%  \left[ r_{i,t,j} + r'_{i, t, j, k_{j, c_{i,j}}} \right]_{i \in I, (j, k) \in J'}, 
%$$
%whereas the attacker's of type $1 \leq t \leq n$ payoff matrix is:
%$$
%  \left[ c_{i,t,j} + c'_{i, t, j, k_{j, c_{i,j}}} \right]_{i \in I, (j, k) \in J'}.
%$$
%Note that in MQLP formulation~\eqref{mqlp:transformed}, 
The size of the set of follower's moves grows exponentially,
$
  | J' | = \sum_{j \in J} |K|^{|\C_j|},
$
and so does the number of constraints~\eqref{mqlp:transformed optimality}.
Compare this to the MIQP formulation~\eqref{mqlp:formulation}, where we have a polynomial number
$
  |J||K| + 2 |J| = \sum_{j \in J} (|K| + 2)
$
of constraints that correspond to the optimality of the follower's actions. 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\section{Experimental analysis of one-phase and two-phase models}\label{sec:experiments}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%In this experimental analysis, we first experimentally compare the time complexity of optimal strategy computations between different two-phase models. Next, we compare performance of the strategies computed with our model and DOBBS algorithm against a two-phase attack.

\begin{table}
\setlength\tabcolsep{3.8pt}
\renewcommand{\arraystretch}{1.035}
\begin{subtable}{\columnwidth}
\begin{tabular}{llllllll}
\toprule
{} & {1} & {2} & {3} & {4} & {5} & {6} & {7} \\
\midrule
1 & 0.009 & 0.012 & 0.013 & 0.013 & 0.015 & 0.019 & 0.023 \\
2 & 0.010 & 0.015 & 0.062 & 0.777 & 0.897 & 1.015 & 1.816 \\
3 & 0.015 & 0.023 & 0.584 & 0.621 & 1.959 & 1.232 & 1.693 \\
4 & 0.014 & 0.037 & 0.297 & 1.343 & 1.248 & 1.454 & 2.698 \\
5 & 0.009 & 0.028 & 0.162 & 0.605 & 1.450 & 2.166 & 2.501 \\
6 & 0.010 & 0.031 & 0.320 & 0.744 & 1.503 & 7.702 & 13.679 \\
7 & 0.113 & 0.107 & 0.430 & 1.115 & 1.907 & 3.627 & 7.968 \\
\bottomrule
\end{tabular}
\caption{Performance of the MILP formulation, in sec.}
\label{tab:perf1}
\end{subtable}
\begin{subtable}{\columnwidth}
\setlength\tabcolsep{2.8pt}
\begin{tabular}{llllllll}
\toprule
{} & {1} & {2} & {3} & {4} & {5} & {6} & {7} \\
\midrule
1 & 0.008 & 0.009 & 0.009 & 0.010 & 0.012 & 0.015 & 0.019 \\
2 & 0.007 & 0.023 & 0.235 & 1.044 & 2.139 & 3.958 & 6.003 \\
3 & 0.013 & 0.072 & 0.968 & 2.845 & 4.910 & 10.309 & 30.679 \\
4 & 0.014 & 0.102 & 1.124 & 6.962 & 14.434 & - & - \\
5 & 0.008 & 0.076 & 1.099 & 9.569 & 23.458 & - & - \\
6 & 0.013 & 0.183 & 1.401 & 34.405 & - & - & - \\
7 & 0.034 & 0.339 & 1.967 & 40.175 & - & - & - \\
\bottomrule
\end{tabular}
\caption{Performance of the MQLP formulation, in sec.}
\label{tab:perf2}
\end{subtable}
\begin{subtable}{\columnwidth}
\setlength\tabcolsep{3.8pt}
\begin{tabular}{llllllll}
\toprule
{} & {1} & {2} & {3} & {4} & {5} & {6} & {7} \\
\midrule
1 & 0.007 & 0.009 & 0.009 & 0.012 & 0.018 & 0.031 & 0.041 \\
2 & 0.006 & 0.012 & 0.055 & 0.352 & 1.627 & 6.524 & 20.422 \\
3 & 0.012 & 0.043 & 1.567 & - & - & - & - \\
4 & 0.006 & 0.173 & - & - & - & - & - \\
5 & 0.007 & 0.743 & - & - & - & - & - \\
6 & 0.007 & 3.024 & - & - & - & - & - \\
7 & 0.024 & - & - & - & - & - & - \\
\bottomrule
\end{tabular}
\caption{Performance of the DOBSS formulation, in sec.}
\label{tab:perf3}
\end{subtable}
\caption{
Performance of MQLP, MILP and DOBSS.
Row and column headers display number of defender and attacker moves.
Averaged over $4$ runs. Time limit $60$ seconds.
}
\label{tab:performance}
\end{table}

\begin{figure}
	\begin{tikzpicture}
		\begin{axis}[
			legend style={at={(0.72,1.0)}},
			ymin=0,
			height=6cm,
			width=\columnwidth,
			ymajorgrids,
			ylabel near ticks,
			xlabel near ticks,
			xlabel={},
			xmin=1,
			ymin=0,
			%ymax=60,
			ylabel=Time (s),
			xlabel=Number of moves (attacker)
			]    
% milp form
\addplot+[] coordinates {
	(1,0.010476112365722656)(2,0.0150842547416687)(3,0.11409593820571899)(4,0.24936457872390747)(5,0.35225815773010255)(6,0.45719386339187623)(7,0.6367618083953858)(8,0.9955112218856812)(9,1.0041609406471252)(10,1.2143901586532593)(11,1.4458315134048463)(12,2.0502103328704835)(13,2.5397725701332092)(14,3.089488959312439)(15,3.563098335266113)(16,3.9245975255966186)(17,5.0711515069007875)(18,6.132181632518768)(19,6.997408938407898)(20,8.008444094657898)(21,9.815151393413544)(22,9.304634225368499)(23,12.8732342004776)(24,18.42828687429428)(25,18.275669026374818)(26,23.114855194091795)(27,21.12606347799301)(28,18.75153329372406)(29,28.433011531829834)(30,38.71324617862702)(31,47.47981103658676)(32,59.209393656253816)(33,33.9998899936676)(34,57.01368236541748)(35,75.7700476527214)(36,72.96317315101624)(37,58.53309000730515)(38,87.3597840666771)(39,82.71771936416626)(40,72.28504519462585)(41,179.81232165098191)(42,127.86401740312576)(43,122.09564769268036)(44,125.23623090982437)(45,182.3435191631317)(46,186.50986961126327)(47,245.68122218847276)(48,261.5787557244301)(49,215.1347699403763)(50,219.2781986474991)(51,236.41421556472778)(52,245.65099618434905)(53,355.3957396864891)(54,268.137425327301)(55,290.14264529943466)(56,384.0750267744064)(57,402.0512573003769)(58,310.15571949481966)(59,505.13417981863023)(60,469.58935539722444)(61,400.9217622756958)(62,382.7585603952408)(63,444.8841572642326)(64,490.4952214360237)(65,486.473743891716)(66,507.0250289082527)(67,600)
};
% mqlp form
\addplot+[] coordinates {
	(1,0.01041024923324585)(2,0.052164530754089354)(3,0.49627937078475953)(4,1.323216152191162)(5,2.511202037334442)(6,8.830553615093232)(7,16.491569066047667)(8,56.13447879552841)(9,356.4694398403168)(10,452.6718277335167)(11,414.1486938714981)(12,600)
};
% normal form
\addplot+[] coordinates {
	(1,0.0067947149276733395)(2,0.02204136848449707)(3,0.9960790872573853)(4,44.18104326725006)(5,566.1097449183465)(6,600)
};
\addlegendentry{Two-phase MILP}
\addlegendentry{Two-phase MQLP}
\addlegendentry{DOBSS}
\end{axis}
\end{tikzpicture}
\caption{Running time (averaged over $20$ runs) for random problems with 3 defender's moves against a number of attacker's moves marked on the $x$ axis. Two-phase attack with 2 attacker types. Time limit 600 seconds.}
\label{fig:running time graph}
\end{figure}

\begin{table}[t]
\setlength\tabcolsep{0pt}
\begin{tabular*}{\columnwidth}{@{\extracolsep{\fill}}@{\extracolsep{\fill}}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}r}
\toprule
 &  $(3,1)$& $\hspace{1mm} (3,2)$ & $ (4,2)$ & $ \hspace{1mm}(5,2)$ \\
\midrule
Constant & 33\%, 33\% & 50\%,\ \ \  0\%& 100\%, 0\% & 100\%, 0\% \\
Linear   & 41\%, 32\% & 50\%, 30\%& 100\%, 0\% & 46\%, 0\%\\
Exponential   & 43\%, 32\% & 50\%, 30\%& 100\%, 0\% & 48\%, 0\%\\
\bottomrule
\end{tabular*}
\caption{Chance that defender's optimal strategy is vulnerable to two-phase attack. In each column, number on the left shows chance for mixed strategy computed against one-phase attack with DOBSS. Number on the right shows chance for mixed strategy with model proposed in the paper. Lower is better.}
\label{tab:chancesunprotected}
\end{table}

\section{Experimental evaluation}

We evaluated performance of the three algorithms considered in the paper: mixed quadratic linear program (MQLP), mixed integer linear program (MILP) and DOBSS applied to a single-phase problem transformed from a two-phase form.
We also verified that observation of Section~\ref{sec:motivating_example} (Figures~\ref{fig:strategy} and~\ref{fig:payoffs}) that a defensive strategy computed against a single-phase attack is vulnerable to a two-phase attack is universal. 

\subsection{Time complexity of optimal strategy computations in two-phase models}

Table~\ref{tab:performance} shows comparison of three algorithms that compute optimal strategies in two-phase Bayesian Stackelberg games. 
Table~\ref{tab:perf1} shows solution times for mixed quadratic linear programming (MQLP) formulation~\eqref{mqlp:formulation}.
Table~\ref{tab:perf2} shows solution times for mixed integer linear programming (MILP) formulation~\eqref{milp:formulation}.
Finally, Table~\ref{tab:perf3} shows solution times for DOBSS applied to one-phase problem obtained with a transformation described in Section~\ref{sec:comparison}.

In each table, the row number is the number of defender's moves and the column number is the number of attacker's moves.
The time is measured in seconds and was averaged over $4$ independent runs.
Cases where time limit of $60$ seconds was reached are marked with '-'.

Table~\ref{fig:running time graph} shows analogous comparison for larger numbers of attacker moves and two attacker types.
The computation was performed with SCIP solver on a single core of Intel Xeon 3.60GHz processor.

\subsection{Comparison of strategies against two-phase attacks}

Example from Section~\ref{sec:motivating_example} shows that a defensive strategy computed against a single-phase attack is vulnerable to a two-phase attack is universal. 
Table~\ref{tab:chancesunprotected} shows that this is universal (i.e. the example is not cherry-picked). 
In particular, we checked that this pattern (severe loss against a two-phase attack) always emerges for different value profiles (constant, linear, exponential) of the defended targets and over different ranges of random noise. 

A defender's mixed strategy is {\em vulnerable} to two-phase attack if it permits a phase-one attack such that knowledge of the outcome guarantees that the second-phase attack will be successful.
We considered border patrolling game with $3$, $4$ or $5$ border segments and $1$ or $2$ patrols
We considered three payoff profiles for the attacker: constant (successful attack of each segment of the border is of equal value to the attacker), linear (the value grows linearly) and exponential.
Results are averaged over $4$ runs.

%In the tables below we show payoff value change when attackers vary from one-phase to two-phase, using strategies computed with the help of the DOBBS algorithm and our model for various sizes and scalings of the problem.

%\setlength\tabcolsep{2pt}
%\begin{table}[ht]
%\caption{Comparison of payoffs against single stage attack.}
%\begin{tabular*}{\columnwidth}{@{\extracolsep{\fill}}@{\extracolsep{\fill}}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}r}
%\toprule
% &  $(3,1)$& $\hspace{1mm} (3,2)$ & $ (4,2)$ & $ \hspace{1mm}(5,2)$ \\
%\midrule
%Constant & -7.6, -7.6 & 3.7, 0.0& 2.6, 1.13 & -2.32, -3.48 \\
%Linear   & -11.21, -18.95 & 3.7, 0.0& 2.6, -8.51 & -7.02, -14.91\\
%Exponential   & -11.53, -24.92 & 3.7, 0.0& 2.6, -9.67 & -8.28, -21.51\\
%\bottomrule
%\end{tabular*}
%\end{table}


%\setlength\tabcolsep{3pt}
%\begin{table}[ht]
%\caption{Comparison of payoffs against two-stage attack.}
%\begin{tabular*}{\columnwidth}{@{\extracolsep{\fill}}@{\extracolsep{\fill}}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}r}
%\toprule
% &  $(3,1)$& $\hspace{1mm} (3,2)$ & $ (4,2)$ & $ \hspace{1mm}(5,2)$ \\
%\midrule
%Constant & -23.0, -23.0 & -24.0, 10.67& -61.0, -7.17 & -33.0, -9.6 \\
%Linear   & -88.77, -62.29 & -74.0, 11.15& -165.5, -18.13 & -146.65, -41.11\\
%Exponential   & -107.86, -78.80 & -99.0, 11.04& -265.5, -20.94 & -280.44, -57.96\\
%\bottomrule
%\end{tabular*}
%\end{table}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Related Work}\label{sec:related:work}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The literature on security games is vast and continuously growing (the surveys can be found in \citet{sinha2018stackelberg} and \citet{fang2016green}).
The first related body of works is on  
multi-stage Stackelberg games in which the attacker and defender interact in stages. In \citet{LUH1984251}, the authors analysed systems where players choose among pure strategies. In \citep{zychowski2022coevolutionary}, an evolutionary algorithm for solving multi-stage Stackelberg games was proposed, whereas in \citep{guzman2022sequential}, an inspection game is formalized as a multiple-stage Stackelberg game.
%, which exhibits some similarities with our model, it differs however, in several ways: only one patrol at the defender's disposal, which changes position after the first phase, a specific form of payoff functions and multiple attackers in each phase. 
Two-stage (but not two-phase) Stackelberg games were considered in the literature, e.g., in \citep{anand2008strategic,gray2009outsourcing,kabul2019value,wang2022pollution}. 

Our model can also be understood as a method to prevent a deception attack (see \citep{kar2015game} for an example). To this end, let us assume that the attacker, aware that the defender relies on the DOBSS algorithm (against a one-phase attack), chooses an appropriate two-phase strategy for which the defender is unprepared. Now, if the defender uses our algorithm, the situation changes accordingly. 

%In the past, the deception of the defender's strategy was considered in the literature from various perspectives. In , the authors considers the case of a defender using PAWS-like algorithms. Trying to learn on the past opponents behaviour he is being decepted by an adversarial misleading actions. Pretending to be bounded rational, the opponent influences the leader to wrongly adjust his defense, aiming at maximizing cumulative payoff in the long run sequential strategy.

%\begin{table}[t]
%\begin{tabular*}{\columnwidth}{@{\extracolsep{\fill}}@{\extracolsep{\fill}}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}r}
%\toprule
% & \multicolumn{2}{c}{$T_1$} & \multicolumn{2}{c}{$T_2$} & \multicolumn{2}{c}{$T_3$} & \multicolumn{2}{c}{$T_4$} & \multicolumn{2}{c}{$\emptyset$}\\
%\midrule
%$T_1T_2$ & 14,& -10 & 23,& -20 & -34,& 30 & -42,& 40 & 0,& 0\\
%$T_1T_3$ & 10,& -10 & -20,& 20 & 32,& -30 & -43,& 40 & 0,& 0\\
%$T_1T_4$ & 12,& -10 & -23,& 20 & -33,& 30 & 44,& -40 & 0,& 0\\
%$T_2T_3$ & -11,& 10 & 24,& -20 & 31,& -30 & -41,& 40 & 0,& 0\\
%$T_2T_4$ & -11,& 10 & 20,& -20 & -31,& 30 & 42,& -40 & 0,& 0\\
%$T_3T_4$ & -11,& 10 & -21,& 20 & 34,& -30 & 44,& -40 & 0,& 0\\
%\bottomrule
%\end{tabular*}
%\caption{Payoffs of the defender and an attacker of type $1$.}
%\label{tab:example1playerL1}
%\begin{tabular*}{\columnwidth}{@{\extracolsep{\fill}}@{\extracolsep{\fill}}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}rr@{\hskip 1pt}r}
%\toprule
% & \multicolumn{2}{c}{$T_1$} & \multicolumn{2}{c}{$T_2$} & \multicolumn{2}{c}{$T_3$} & \multicolumn{2}{c}{$T_4$} & \multicolumn{2}{c}{$\emptyset$}\\
%\midrule
%$T_1T_2$ & 51,& -50 & 102,& -100 & -152,& 150 & -211,& 200 & 0,& 0\\
%$T_1T_3$ & 55,& -50 & -123,& 100 & 175,& -150 & -221,& 200 & 0,& 0\\
%$T_1T_4$ & 59,& -50 & -108,& 100 & -169,& 150 & 206,& -200 & 0,& 0\\
%$T_2T_3$ & -69,& 50 & 101,& -100 & 168,& -150 & -221,& 200 & 0,& 0\\
%$T_2T_4$ & -55,& 50 & 113,& -100 & -170,& 150 & 212,& -200 & 0,& 0\\
%$T_3T_4$ & -75,& 50 & -123,& 100 & 166,& -150 & 211,& -200 & 0,& 0\\
%\bottomrule
%\end{tabular*}
%\caption{Payoffs of the defender and an attacker of type $2$.}
%\label{tab:example1playerL2}
%\end{table}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Conclusions}\label{sec:conclusions}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

We introduced an extension of a standard Bayesian Stackelberg game that takes into account the possibility that an attack can consist of two phases. In our model, the attacker makes a preliminary strike in the first phase in order to gain extra intelligence about the defense. Next, the attacker is able to make a more informed choice of the concluding move. The model is motivated by a pattern of attacks observed on the Belarus-Ukraine border.

The usual setting of Stackelberg games assumes a large asymmetry between the defender and the attacker: on one hand, it is assumed that the attacker has the perfect knowledge of the defender's past actions; on the other hand, it is assumed that the attacker has zero knowledge of the defender’s current defensive position.
The model proposed in the paper reduces this asymmetry: it considers scenarios where the attacker may undertake some actions to gain knowledge about the defender’s current defensive position. 

For this new model, we derived a compact-form MILP formulation and we showed analytically that the reduction in problem size compared to a standard approach is exponential. 
Our results also revealed that using the standard approach to defend against a two-phase attack can lead to severe losses on the defender's side.

%\begin{contributions} % will be removed in pdf for initial submission 
					  % (without ‘accepted’ option in \documentclass)
                      % so you can already fill it to test with the
                      % ‘accepted’ class option
%\end{contributions}

% References
\bibliography{references}

\end{document}
