

\begin{figure*}
    \centering
    \includegraphics[width=0.95\linewidth]{figures/regret_comparison.pdf}
    \caption{Experimental Results on both synthetic and real-world datasets. Following \citep{ho2020denoising}, we average the results over 5 random seeds.}
    \label{fig:exp}
\end{figure*}

\section{Experiments}\label{sec:experiments}

\subsection{Experimental Setup}


\paragraph{Datasets}
We conduct experiments on both synthetic and
real-world datasets which we describe below. We use
a graph based dataset~\citep{nguyen2016capture} to reflect geospatial constraints
in the poaching domain for patrollers.

 \textbf{Synthetic data.} 
Poaching counts are sampled from a Gamma distribution parameterized by shape and scale values. To determine the shape parameter, we randomly select one of two Graph Convolutional Networks (GCNs)~\citep{kipf2022semi} with randomly initialized weights to map the node’s feature vector to a continuous value, which serves as the shape parameter. The scale parameter is set to 1 if the first GCN is chosen and 0.9 if the second is selected. Finally, adversarial noise inversely proportional to the poaching count is added, ensuring that nodes with lower poaching counts receive higher noise levels.%Counts are clipped to [0, 40] and finally scaled by 0.2 to resemble the distribution observed in real-world data.

 
% To mimic the real-world setting, regions are connected with each other according to topology. We randomly generate $5100$ graphs with $30$ nodes and $20$ edges. The first $5000$ of them are used as train set, and the rest $100$ are used as test set. For each node, we randomly generate a $10$ dimensional node feature. The next step is to construct a random map from a node's feature to its poaching count, modeling the complex real-world connection between them. We sample a node's poaching count from gamma distribution, which has shape and scale parameters. We initialize two Graph Convolutional Networks (GCN). For each node, we randomly select one of the GCNs to map the node feature to a shape parameter with equal probability. We scale the shape parameter by a factor of $20$. We then sample the node's poaching count from a gamma distribution. If the first GCN is chosen, the scale is set to $1$, and if the second GCN is chosen, the scale is set to $0.9$. Noises on poaching count are then adversarially added: nodes with smaller poaching count will have a higher noise. We cap the poaching count on each node to the range of $[0,40]$, and finally we scale poaching counts on every node by $0.2$ to make the overall poaching count distribution similar to those in real-world data.

    
\textbf{Real-world Data.} We use poaching data from Murchison Falls National Park (MFNP) in Uganda, collected between 2010 and 2021. The protected area is discretized into 1 × 1 km grid cells. For each cell, we measure ranger patrol effort (in kilometers patrolled) as the conditional variable for the diffusion model, while the monthly number of detected illegal activity instances serves as the adversarial behavior. Following~\citet{basak2016abstraction}, we represent the park as a graph to capture geospatial connectivity among these cells. To focus on high-risk regions, we subsample 20 subgraphs from the entire graph. Specifically, at each month we identify the 20 cells with the highest poaching counts. Each of these cells is treated as a central node, and we iteratively add the neighboring cell with the highest poaching count until the subgraph reaches 20 nodes. This process generates 532 training, 62 validation, and 31 test samples.


\textbf{Baselines}
We compare the following methods:

\textbf{Non-robust Optimization (NRO).} We use a baseline that directly maximizes the expected utility under the pre-trained diffusion model. The stochastic optimization is solved via sample average approximation, using samples from the diffusion model in conjunction with mirror ascent. Since this approach yields only a pure strategy, we repeat the process with different initializations to obtain five distinct pure strategies. These pure strategies are then combined into a mixed strategy by assigning them equal probability.  

\textbf{Alternate Optimization with Random Reinitialization (AOR).} This method solves the DRO problem using alternating optimization without employing the double oracle framework. It iterates between optimizing the defender's strategy and sampling from the worst-case distribution using twisted SMC. Similar to NRO, we construct a mixed strategy by running the procedure five times with different initializations, generating multiple pure strategies that are then combined with equal probability.  

We also compare against three variations of \ours:  

\textbf{\ours with Importance Sampling (\ours-IS).} This variant replaces the twisted SMC sampler in Section~\ref{sec:twisted_sampler} with importance sampling to sample from Eq.~\ref{eq:bo-poacher}. As the proposal distribution, we directly use the pre-trained diffusion model, $p_{\theta}(\mathbf{z}|\mathbf{c})$.  

\textbf{\ours with Diffusion Posterior Sampling (\ours-DPS).} This variant employs the diffusion posterior sampler~\citep{chung2023diffusion} instead of the twisted SMC sampler in Section~\ref{sec:twisted_sampler}.   

\textbf{\ours.} This version retains the default twisted SMC sampler in Section~\ref{sec:twisted_sampler} to sample from Eq.~\ref{eq:bo-poacher}.  


\textbf{Evaluation metrics}
We evaluate the methods using decision \textit{regret}, defined as the difference between the defender's best possible utility under the true adversarial behavior and the expected utility under the optimized mixed strategy. We report both the average regret on the test set and the worst-case regret on the test set.

\textbf{Implementation details}
We employ a three-layer GCN with a hidden dimension of 128 as the backbone of the diffusion model. The optimizer used is Adam~\citep{kingma2014adam} with a learning rate of $10^{-3}$. We use $500$ samples to estimate the expected utility for all the menthods. $\gamma$ is selected on the validation set based on the average regret. More details of the implementation details are provided in Appendix~\ref{sec:exp-details}.

%The actions of the poacher and ranger in grid $j$, represented by $z_j$ and $x_j$ respectively, influence the wildlife population in the area. We model the wildlife population in grid $j$ as follows: 
% $$\max(N_0(j)e^r-\alpha e^{\beta z_j - \theta x_j}, 0),$$
% where $N_0(j)$ is the initial wildlife population in the area and $r$ denotes the natural growth rate of the wildlife. The parameter $\alpha$ captures the impact of both the ranger’s and poacher’s actions on the wildlife population, $\beta$ reflects the strength of poaching, and $\theta$ measures the effectiveness of patrol effort. The utility for the ranger is then represented as the sum of wildlife population across all grids:
% $$U(x, z) = \sum_{j=1}^K \max(N_0(j)e^r-\alpha e^{\beta z_j - \theta x_j}, 0)$$


\subsection{Experimental Results}


\paragraph{Main Results.} We evaluate our method against baselines on both synthetic and real-world poaching datasets under different patrol budgets (\(B=1\) and \(B=5\)). The results, presented in Fig.~\ref{fig:exp}, show that \ours consistently achieves the lowest average regret and worst-case regret across all settings.  

Compared to NRO, \ours reduces average regret by \(62.2\%\), \(62.9\%\), \(73.3\%\), and \(74.0\%\) across different datasets and budgets. Similarly, worst-case regret is significantly reduced by \(59.1\%\), \(64.9\%\), \(71.3\%\), and \(79.0\%\). These improvements highlight the robustness of our approach, which is particularly crucial in green security domains, where minimizing worst-case outcomes is essential for high-stakes decision-making.  

The double-oracle framework provides substantial benefits, as all three variants of \ours significantly outperform the naive robust optimization approach, AOR. This is because AOR relies on a simple heuristic to solve the minimax problem and construct the mixed strategy, lacking convergence guarantees. Consequently, AOR exhibits greater variance and instability, further underscoring the advantages of employing game-theoretic methods for robust optimization. 

% From a game-theoretic perspective, AOR does not consider mixed strategies during optimization, and an equilibrium of pure strategies is not always guaranteed to exist. 

Among the \ours methods employing different sampling strategies, DPS emerges as the strongest alternative to twisted SMC. However, SMC consistently outperforms DPS, demonstrating statistically significant improvements in five out of eight cases. Furthermore, DPS cannot exactly sample from the target distribution in Eq.\ref{eq:bo-poacher}~\citep{lu2023contrastive}, a critical requirement for ensuring the convergence of the double-oracle framework, as analyzed in Section~\ref{sec:convergence_analysis}.


\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{figures/regret_vs_samples_gamma.pdf}
    \caption{Parameter Study on \ours-SMC on poaching data under Budget 5.}
    \label{fig:parameter}
\end{figure}

\textbf{Parameter Study.} Fig.~\ref{fig:parameter} presents the parameter study of \ours using the twisted SMC sampler. As shown, varying the number of samples in the sampler reveals that once the sample size exceeds 200, performance stabilizes. Additionally, when adjusting the value of $\gamma$, we observe that performance drops significantly as $\gamma$ approaches 0, since it effectively reduces to non-robust optimization. Conversely, when $\gamma$ is too large, performance also declines because the nature adversary may select a worst-case distribution that deviates too far from the learned distribution, making it non-informative.
