\section{Preliminaries on Diffusion Model}
\label{sec:diffusion}


% \subsection{Modeling Attacker Behavior with Conditional Diffusion Models}

% In green security, capturing the potentially multimodal nature of poacher behavior is crucial for robust patrol planning. To model the poacher’s response \(\mathbf{z}\) given contextual features \(\mathbf{c}\) (e.g., prior patrol effort, terrain), we employ a \emph{conditional} diffusion model, \(p_\theta(\mathbf{z} \mid \mathbf{c})\).

%real poacher response (e.g., a historical poaching record)

%$q(\mathbf{z}^t \mid \mathbf{z}^{t-1}) = \mathcal{N}\!\bigl(\mathbf{z}^t;\, \sqrt{1-\beta_t}\,\mathbf{z}^{t-1},\, \beta_t \mathbf{I}\bigr),$ 
A diffusion model \citep{sohl2015deep} is a generative framework composed of two stochastic processes: a \emph{forward} process that progressively adds Gaussian noise to real data, and a \emph{reverse} (or denoising) process that learns to remove this noise step by step. Formally, let \(\mathbf{z}^0 \sim \mathcal{D}\) be a sample from the training dataset.\footnote{We use \(\mathbf{z}^0\) and \(\mathbf{z}\) interchangeably when there is no ambiguity.} The forward diffusion process can be written as
$q(\mathbf{z}^t \mid \mathbf{z}^{t-1}) 
= \mathcal{N}\!\bigl(\mathbf{z}^t;\,\mathbf{z}^{t-1},\,\beta^2 \mathbf{I}\bigr),$
where \(\beta^2\) is the noise variance at each step \(t=1,\dots,T\). As \(T\) becomes large, repeated noising transforms the data distribution into (approximately) pure Gaussian noise:
$q(\mathbf{z}^T) \approx \mathcal{N}(\mathbf{0},\,T \beta^2 \mathbf{I}).$

\textbf{Score-based Approximation.} To invert this process (i.e., to denoise and recover samples from the original data distribution), one can approximate the reverse transition 
\(
q(\mathbf{z}^{t-1} \mid \mathbf{z}^t) 
\)
via the \emph{score function}, \(\nabla_{\mathbf{z}^t} \log q(\mathbf{z}^t)\) when $\beta$ is small. Specifically,
\[
q(\mathbf{z}^{t-1} \mid \mathbf{z}^t) 
\,\approx\, 
\mathcal{N}\!\Bigl(\mathbf{z}^{t-1};\,
\mathbf{z}^t 
+ \beta^2 \,\nabla_{\mathbf{z}^t}\! \log q(\mathbf{z}^t),\,
\beta^2 \mathbf{I}\Bigr).
\]
Here, \(q(\mathbf{z}^t) = \int q(\mathbf{z}^0)\,q(\mathbf{z}^t \mid \mathbf{z}^0)\, d\mathbf{z}^0\), and the gradient \(\nabla_{\mathbf{z}^t}\! \log q(\mathbf{z}^t)\) points toward regions of higher data density. In practice, we do not know \(q(\mathbf{z}^t)\) in closed form, so a neural \emph{score network} \(s_{\theta}(\mathbf{z}^t, t)\) is trained to approximate this gradient via \emph{denoising score matching} \citep{vincent2011connection, ho2020denoising}. Consequently, the learned reverse (denoising) transition becomes
\[
p_{\theta}(\mathbf{z}^{t-1} \mid \mathbf{z}^t) 
= \mathcal{N}\!\Bigl(\mathbf{z}^{t-1};\,
\mathbf{z}^t 
+ \beta^2\, s_{\theta}(\mathbf{z}^t, t),\,
\beta^2 \mathbf{I}\Bigr).
\]
Starting from an initial Gaussian sample \(\mathbf{z}^T \sim \mathcal{N}(\mathbf{0},\, T \beta^2 \mathbf{I})\), iterating this reverse process ultimately recovers samples that approximate the original data distribution.

\textbf{Conditional Extension.} This diffusion framework can be naturally extended to include additional context \(\mathbf{c}\). In a \emph{conditional} diffusion model~\citep{ho2021classifier}, the score network becomes \(s_{\theta}(\mathbf{z}^t, t, \mathbf{c})\), so that at each step the denoising is informed by side information such as class labels, textual descriptions, or other relevant features. This conditional approach enables the generation of samples that match not only the learned data distribution but also the specific context \(\mathbf{c}\), making it particularly useful for tasks in which external conditions strongly influence the underlying data generation process.

% Instead of directly estimating the score function \( s_{\theta}(\mathbf{z}^t, t) \), Denoising Diffusion Probabilistic Models (DDPM) reparameterize the learning objective in terms of \textit{noise prediction}. This is motivated by the fact that, given the closed-form expression for \( q(\mathbf{z}^t \mid \mathbf{z}^0) \), we can rewrite: $\mathbf{z}^t = \sqrt{\bar{\alpha}_t} \mathbf{z}^0 + \sqrt{1 - \bar{\alpha}_t} \boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(0, \mathbf{I}).$ Thus, the goal of DDPM training is to predict the noise \(\boldsymbol{\epsilon}\) that was added to transform \(\mathbf{z}^0\) into \(\mathbf{z}^t\). A neural network \(\boldsymbol{\epsilon}_\theta(\mathbf{z}^t, t)\) is trained to approximate \(\boldsymbol{\epsilon}\), which is equivalent to learning the score function up to a scaling factor: $s_{\theta}(\mathbf{z}^t, t) = - \frac{\boldsymbol{\epsilon}_\theta(\mathbf{z}^t, t)}{\sqrt{1 - \bar{\alpha}_t}}.$


% Training the diffusion model then reduces to minimizing the mean squared error (MSE) loss between the true noise \(\boldsymbol{\epsilon}\) and the predicted noise \(\boldsymbol{\epsilon}_\theta\):
% \[
% \mathcal{L}_{\text{simple}} = \mathbb{E}_{\mathbf{z}_0,\, \boldsymbol{\epsilon},\, t}\!
%   \bigl[\|\boldsymbol{\epsilon} 
%     - \boldsymbol{\epsilon}_\theta(\mathbf{z}_t, t, \mathbf{c})\|^2\bigr].
% \]

% By training this conditional diffusion model on historical poaching data—augmented with contextual variables \(\mathbf{c}\)—we learn \( p_\theta(\mathbf{z} \mid \mathbf{c}) \), a flexible and expressive representation of poacher behavior. This allows us to capture complex, multimodal patterns of attacker responses, which in turn informs the robust patrol strategies described in the preceding sections.


\section{Problem Formulation}
In green security settings, a defender (e.g., a ranger) patrols a protected area to prevent resource extraction by an attacker (e.g., a poacher or illegal logger). Let \(K\) denote the number of targets—such as \(1 \times 1\) km regions within the protected area—that require protection. The defender must allocate patrol effort across these targets while adhering to a total budget \(B\). Formally, the patrol strategy is represented as \(\mathbf{x} = (x_1, \dots, x_K)\), where \(x_k\) denotes the amount of effort (e.g., patrol hours) assigned to target \(k\). The defender’s strategy is constrained by: $\mathcal{X} = \{\mathbf{x} \in \mathbb{R}^K \mid x_k \geq 0, \forall k, \sum_{k=1}^{K} x_k \leq B\}$, which ensures that all patrol efforts are non-negative and do not exceed the available budget $B$.  




\textbf{Attacker Behavior via a Conditional Diffusion Model.} Building on the diffusion-model framework in Section~\ref{sec:diffusion}, we now focus on a poaching scenario, in which an attacker’s behavior can be highly uncertain and multimodal. Let \(\mathbf{z}\) denote the number of snares or traps placed in each \(1 \times 1\) km region, where \(K\) is its dimensionality. Similarly, let \(\mathbf{c}\) represent contextual features, including last month’s patrol effort per region~\citep{pmlr-v161-xu21a,xu2020stay}, distance to the park boundary, elevation, and land cover. We model the attacker’s behavior with a continuous \emph{conditional diffusion model} \(p_{\theta}(\mathbf{z} \mid \mathbf{c})\). Concretely, we treat historical poaching data as samples of \(\mathbf{z}^0\), add noise in a forward process, and learn a reverse (denoising) process conditioned on \(\mathbf{c}\). Once trained, this diffusion model captures how poachers respond to different patrol allocations and environmental factors. By sampling from \(p_{\theta}(\mathbf{z} \mid \mathbf{c})\) for new contexts, we can generate realistic, diverse poaching scenarios that inform patrol strategy design. Table~\ref{table:mse} shows the forecasting results on the real-world poaching data and we can see the \textbf{diffusion model can outperform existing approaches used in green security}. Experimental details are in Appendix.~\ref{sec:exp-details}.



\begin{table}[h]
\footnotesize
\centering
\begin{tabular}{@{}ccc@{}}
\toprule
\textbf{Model} & \textbf{MSE} \\ \midrule
Linear regression  & $24.40$ \\
Gaussian process   & $24.21\pm 0.04$ \\
Diffusion model    & $\mathbf{23.46} \pm 0.07$ \\ \bottomrule
\end{tabular}
\caption{Forecasting performance in terms of mean squared error (MSE) on poaching data.}
\label{table:mse}
\end{table}




\textbf{Robust Patrol Optimization.} In practice, the learned diffusion model may be imperfect due to data noise, limited training samples, or suboptimal network training. Consequently, the learned distribution might not accurately capture the true underlying behavior. To ensure robustness in patrol strategy design, we assume the true distribution lies within a bounded KL-divergence from the learned distribution. We then optimize for the worst-case expected utility over all distributions in that KL-divergence ball, leading to the following formulation:
\begin{align}
 \textstyle &\max_{\pi(\mathbf{x})\in \Delta(\mathcal{X})} \min_{\tau(\mathbf{z})\in \mathcal{T}} 
\quad  \mathbb{E}_{\pi(\mathbf{x})} \mathbb{E}_{\tau(\mathbf{z})} [u(\mathbf{x}, \mathbf{z})]  \nonumber\\
&\mathcal{T} = \left\{ \tau(\mathbf{z}) \mid D_{\rm KL}(\tau(\mathbf{z}) \parallel p_{\theta}(\mathbf{z} \mid \mathbf{c})) \leq \rho \right\},
\label{eq:DRO}
\end{align}
where \(\rho\) is a tolerance parameter specifying how far the true distribution may deviate from the learned distribution.  \(u(\mathbf{x}, \mathbf{z})\) represents the defender’s utility (e.g., the number of animals in the park) for strategy \(\mathbf{x}\) given the adversary’s choice \(\mathbf{z}\) and is assumed to be bounded in \([0, M]\).

Eq.~\eqref{eq:DRO} can be interpreted as a two-player zero-sum game in which the \emph{defender} seeks a robust mixed strategy \(\pi(\mathbf{x})\), while a  \emph{nature adversary} (representing model misspecification) selects \(\tau(\mathbf{z})\) within the KL-divergence ball to minimize the defender’s expected utility. The defender’s pure strategy space is \(\mathcal{X}\), and the adversary’s pure strategy space is \(\mathcal{Z} = \mathrm{Support}(p_{\theta}(\mathbf{z}|\mathbf{c}))\). The defender’s mixed strategy \(\pi(\mathbf{x})\) is a probability distribution over \(\mathcal{X}\), with the corresponding space denoted by \(\Delta(\mathcal{X})\). In contrast, the adversary’s mixed strategy \(\tau(\mathbf{z})\) is in the constained space $\mathcal{T}$.

For notational simplicity, when both players use mixed strategies, the defender’s expected utility is denoted by \(U(\pi, \tau)\). If one player employs a pure strategy and the other a mixed strategy, we write \(U(\mathbf{x}, \tau) \coloneq U(\delta_{\mathbf{x}}, \tau)\).


Note that, unlike standard DRO, where the goal is typically to find a single strategy $\mathbf{x}$, here we aim to identify a \emph{mixed} strategy for the defender. This approach is particularly important in green security settings, as adopting a randomized policy helps prevent predictability. A deterministic patrol strategy could be exploited by adversaries, such as poachers, who can adapt their behavior to bypass predictable patterns. By introducing randomness into the patrol strategy, we increase the difficulty for adversaries to anticipate the ranger's actions, thereby enhancing the overall security and effectiveness of the patrol.




% The function \(u(\mathbf{x}, \mathbf{z})\) represents the defender’s utility for strategy \(\mathbf{x}\), given the adversary’s choice \(\mathbf{z}\), and is assumed to be bounded within \([0, M]\).  If both players adopt mixed strategies, the defender’s expected utility is denoted as \(U(\pi, \tau)\). In cases where one player follows a pure strategy while the other follows a mixed strategy, we use the shorthand notation \(U(\mathbf{x}, \tau) \coloneq U(\delta_{\mathbf{x}}, \tau)\).
% We assume that the utility function is concave with respect to the ranger's strategy, i.e., there is diminishing marginal return in ranger effort.  This assumption reflects the intuition that initial patrol efforts contribute more significantly to wildlife protection than additional increments in effort \haichuan{would be great if we can cite some empirical evidence}.

% In this formulation, the defender seeks to optimize a robust mixed strategy \(\pi(\mathbf{x})\), while the adversary selects \(\tau(\mathbf{z})\) in the KL-divergence ball to minimize the defender’s expected utility. This formulation explicitly accounts for observational noise, ensuring that the patrol strategy remains effective despite errors in the learned behavior model. If both players adopt mixed strategies, the defender’s expected utility is denoted as \(U(\pi, \tau)\). In cases where one player follows a pure strategy while the other follows a mixed strategy, we use the shorthand notation \(U(\mathbf{x}, \tau) \coloneq U(\delta_{\mathbf{x}}, \tau)\).






% \begin{assumption}[Concavity of Utility Function]\label{assump:concavity}
%     We assume that the utility function is twice differentiable with respect to $x$, and we assume that $\frac{\partial^2 U(x,\tau)}{\partial x^2} \leq 0 \ \forall x$, i.e., the utility function is concave in ranger effort.
% \end{assumption}




% The concavity assumption implies that $\frac{\partial V(a,q(z))}{\partial a}$ is decreasing in $a$, which is equivalent to saying there is diminishing marginal return to ranger effort. The intuition is within the first unit of effort, the most obvious snares in the field are first removed. Hence, the same effort will lead to less snares found later.








