%!TEX root=../main.tex

\section{Introduction}\label{sec:intro}

Adversarial examples, small perturbations in the vicinity of correctly
classified inputs that result in misclassification, have been widely
documented in the
literature~\citep{szegedy2013intriguing,goodfellow2014explaining}.
Although mostly studied in the context of neural networks, research
has demonstrated that decision-tree ensembles are also susceptible to
adversarial perturbations~\citep{papernot2016transferability}. Such
vulnerabilities are particularly concerning in safety-critical
applications, where robust model performance is essential for
deployment~\citep{malware-detect}.


Several methods have been put forward to mitigate the impact of
adversarial examples in the context of neural networks. Some
approximate the worst-case loss under adversarial perturbations to be
used under training~\citep{goodfellow2014explaining,madry2018minmax}.
Others aim to identify a provable upper bound to the loss function
under adversarial
perturbations~\citep{salman2019provably,zhang2019theoretically,
  huang2021training,mueller2023certified,DePalma2024}


Similar approaches have been proposed for decision-tree ensembles,
particularly with respect to the derivation of robust splits and the
minimisation of the worst-case adversarial loss within the ensemble
building
process~\citep{Kantchelian,chen2019training,andriushchenko2019provably,
  vos2021groot,fprdt2022}.  A common theme among these works is the
utilisation of specific properties of specific loss functions, such as
% the information gain~\cite{chen2019training} and Gini impurity
the gini-impurity~\citep{vos2021groot}, margin-based classification
loss functions~\citep{andriushchenko2019provably}, and the binary loss
function~\citep{fprdt2022}. The applicability of these approaches is
limited to classification tasks only, thereby failing to address
other tasks such as regression prevalent in domains such as finance.  
To the best of our knowledge, only the method described in 
\citep{chen2019training} supports robust training for general loss 
functions, however, this method relies 
on a heuristic to estimate the loss under an adversarial perturbation, 
which can lead to suboptimal robustness. Therefore, the derivation of 
approaches for robust learning of
ensembles with particular applicability to regression tasks remains
largely unexplored.
%of
%considerable importance.

To overcome these shortcomings, we present a novel approach to
construct robust ensembles in the XGBoost framework that can be
generally applied across various tasks and loss functions,
%AL: added
with particular applicability to regression tasks.
Our contributions can be summarised as follows:

\begin{itemize}
    \item We introduce an efficient analytical solution to the upper
      bound of the robust loss when building XGBoost trees. This
      incorporates the impact of worst-case adversarial perturbations
      in the recursive node splitting procedure in $\mathcal{O}(1)$
      time, leading to an overall complexity of $\mathcal{O}(n\log
      n)$. Our solution is general and can be applied to XGBoost
      ensembles with any loss function, and adversarial attack model.
    
    % we can add statistics about the results once the experiments are completed
    \item We study the robustness of XGBoost ensembles in the
      regression task, and highlight how conventionally trained models
      are extremely sensitive to input perturbations. We demonstrate
      that our proposed robust-splitting criterion significantly
      improves the robustness of the model, particularly on 
      attacks with large perturbation magnitudes.
\end{itemize}

The rest of this work is organised as follows. We begin with a review
of related work in Section~\ref{sec:related}. 
Section~\ref{sec:background} provides an overview of the XGBoost
algorithm and the adversarial attack model used in this work.
Section~\ref{sec:rob-loss} describes our proposed robust-splitting
criterion; Section~\ref{sec:evaluation} evaluates the approach and
compares it with related work on a variety of datasets. Finally,
Section~\ref{sec:conclusion} concludes the work and discusses
potential future directions.

\section{Related Work}
\label{sec:related}

Early seminal work in the robust training of tree-based models
considered the augmentation of adversarial examples into the training
process in successive boosting rounds~\citep{Kantchelian}. The work
also demonstrated that identifying optimal adversarial examples under input
constraints is NP-hard for an ensemble of trees, and provides a MILP
formulation solution to exactly compute the minimal adversarial
distance for a boosted ensemble. Although their training method is 
computationally efficient, it relies on sampling a finite number of 
adversarial examples when constructing trees, which is equivalent to 
optimizing a lower bound of the robust loss and ultimately results in 
suboptimal certified robustness. In contrast, our approach optimizes 
a certified upper bound of the robust loss, thereby achieving superior 
robustness.

More recent contributions aimed to increase the robustness of
tree-based models by considering the worst-case adversarial loss when
evaluating the quality of a candidate split. In particular,
\cite{chen2019training} presents a framework to determine the
worst-case splitting score for a candidate split with inputs perturbed
with an $L_{\infty}$ perturbation. It specifically gives a
gini-impurity scoring function that can be computed exactly in
$\mathcal{O}(n)$ time, and further presents a heuristic to estimate
the worst-case adversarial loss for any scoring function in
$\mathcal{O}(1)$ time. The method is integrated within the XGBoost
algorithm, and is used to train robust decision trees and boosted
ensembles. However, since the heuristic effectively provides only a
lower bound on the robust loss, it can lead to suboptimal robustness
during tree construction. An extension to the framework is provided by
\cite{Treant}, which considers an asymmetrical, non-uniform attack
model characterized by axis-aligned perturbations and introduces the
concept of an attack budget, limiting the number of points that can be
perturbed. In this work, the attack model is integrated into a
robust-loss function that is computed exactly per split. The
complexity of this method is thus very high and is not scalable to
large datasets. In contrast to these approaches, our method computes
an upper bound to the robust loss in $\mathcal{O}(1)$ time which leads
to improved robustness without compromising computational efficiency.


Similarly, \cite{vos2021groot} presents an exact analytical solution
for the worst-case gini-impurity score under an adversarial
perturbation that can be computed in $\mathcal{O}(1)$ time. The work
proves in particular that the robust loss function used in the split
construction process is concave and can thus be solved analytically.
However, the solution is specific to the gini-impurity score and the
construction of  independent classification trees. Therefore, it  cannot be
used to build boosted ensembles or trees for other tasks. 

Other methods have been proposed to determine an upper bound on the robust 
loss over boosted ensembles and to train successive trees that minimize 
this loss. In particular, \cite{andriushchenko2019provably} introduces a 
method for obtaining an upper bound on the robust loss for binary 
classification tasks using a margin-based loss function over the 
ensemble, with a computational complexity of $\mathcal{O}(n)$. 
This approach is further optimized in \cite{fprdt2022}, where a 0-1 
loss function is employed for AdaBoost ensembles~\citep{adaboost}, 
resulting in a robust loss computation per split in $\mathcal{O}(1)$ time. 
However, these formulations are tailored specifically for classification 
tasks and are not easily extended to other domains. Moreover, 
the optimization in \cite{fprdt2022} exploits unique properties 
of the 0-1 loss function, rendering the approach inapplicable 
to other convex loss functions. In contrast, our method provides an 
analytical solution for an upper bound on the robust loss that can be 
applied to any loss function, thereby offering greater flexibility 
and broader applicability across various tasks.

Similarly to~\cite{chen2019training}, our  work targets the computation
of robust splitting scores. However, instead of considering an
approximate heuristic to estimate the robust loss, we propose an
efficient analytical solution that computes an upper bound on the
worst-case loss in $\mathcal{O}(1)$ time using a linear relaxation
formulation of the splitting score used in an XGBoost tree. As the
XGBoost algorithm constructs successive trees to minimise a second
order Taylor approximation of the loss function, our method supports
all differentiable loss functions and can be applied across various
tasks. 

Other related works propose more computationally intensive approaches, 
such as MILP formulations for robust optimal trees~\citep{vos2022optimal}, and
post-training methods to prune and relabel tree leaves~\citep{vos2022relabelling}, 
to achieve robustness. 
However, due to their exponential or high polynomial time complexity, 
these methods are impractical for large-scale datasets compared to our approach.
