\section{Experimental Results}

In this section, we present all the results of the experiments in the main part of the paper. Namely, we provide detailed model information, we add the results of \integration for the normalized relative error set to $0.1$, and we provide the complete results for our ablation study experiments.

\subsection{Model Info}
\label{app:sec:models}

Table~\ref{tab:app-model-info} presents detailed model information for all the models considered in our experimental evaluation.

\begin{table*}[h]
\setlength{\tabcolsep}{6pt}
\centering
\input{tables/appendix-model-info}
\caption{Detailed model information. Column $|S|$ denotes the number of states, $|\variables|$ the number of features, $|\Act|$ the number of actions, ``choices'' the number of available actions (i.e. $\sum_{s \in S} |\Act(s)|$), and ``opt''/``rand'' the value of the optimal and uniform random policy respectively.
}
\label{tab:app-model-info}
\end{table*}

\subsection{Different error threshold}
\label{app:sec:error}

Table~\ref{tab:app-different-threshold} presents the results for \integration in experiments where the error threshold was set to 0.1. Table~\ref{tab:app-lower-threshold} shows results for the case when we use a tighter bound on the relative error (up to 0.01\%).

\begin{table*}[h]
\setlength{\tabcolsep}{2pt}
\centering
\input{tables/appendix-different-threshold}
\caption{Results for \integration with the normalized relative error threshold set to $0.1$. \integration-0.05 represent the values reported in Tab~\ref{tab:main-table} in the main paper. The last column reports the relative size of the produced tree when increasing the error threshold.
}
\label{tab:app-different-threshold}
\end{table*}

\begin{table*}[h]
\setlength{\tabcolsep}{2pt}
\centering
\smaller
\input{tables/appendix-lower-threshold}
\caption{Results for \integration with the normalized relative error threshold set to $0.01$ and $0.0001$ (1\% and 0.01\% respectively). \integration-0.05 represent the values reported in Tab~\ref{tab:main-table} in the main paper. The size\% columns show the relative size of the produced DT compared to the size of \dtcontrol tree (in terms of number of nodes). The error 0 means the relative error was below \SI{1e-6}.
}
\label{tab:app-lower-threshold}
\end{table*}

\subsection{Complete ablation study}
\label{app:sec:ablation}

Table~\ref{tab:app-ablation} shows the complete results of the ablation study. For each model, we ran \dtcontrol with 4 different settings to obtain different initial DTs. This means that in total we have $13\cdot4=52$ benchmarks. Note that even in cases when \dtcontrol produces the same tree, the impurity values are different and therefore can lead to different result.

\begin{table*}[h]
\setlength{\tabcolsep}{2pt}
\centering
\smaller
\input{tables/appendix-ablation}
\caption{Complete results for the ablation study experiments.
}
\label{tab:app-ablation}
\end{table*}


\section{Decision Tree Visualization}
\label{app:sec:visualizations}

In this section we provide visualizations for some of the DTs produced by \integration. We also provide visualization for DT produced by \dtcontrol on the \emph{maze-7} benchmark to showcase the improved explainability of DTs produced by \integration.

\subsection{Interpretability Comparison}

In Figures~\ref{fig:maze-7-dtnest} and~\ref{fig:maze-7-dtcontrol} we show two DTs for benchmark \emph{maze-7}, one produced by \integration and one produced by \dtcontrol, respectively. These visualizations clearly show the better interpretability and the potential of use for DTs produced by \integration compared to the ones produced by \dtcontrol.

\begin{figure}[h]
    \begin{subfigure}{\linewidth}
        \centering
        \includegraphics[width=0.5\linewidth]{figures/maze-7.png}
        \caption{}
        \label{fig:maze-7-dtnest}
    \end{subfigure}
    
    \begin{subfigure}{\linewidth}
        \centering
        \includegraphics[width=\linewidth]{figures/maze-7-dtcontrol.png}
        \caption{}
        \label{fig:maze-7-dtcontrol}
    \end{subfigure}
    \caption{(a) DT produced by \integration on \emph{maze-7} benchmark. (b) DT produced by \dtcontrol on \emph{maze-7} benchmark.}
    \label{fig:maze-7}
\end{figure}


\subsection{Another Visualization}

In Fig.~\ref{fig:csma-3-2-dt} we provide another visualization for a DT produced by \integration, this time for the benchmark \emph{csma-3-2}. Compared to the \emph{maze-7} benchmark in \emph{csma-3-2} not every action is available in each state, nonetheless, as can be seen, none of the leaf nodes contain the random action $\actrandom$.

\begin{figure}[h]
    \centering
    \includegraphics[width=0.45\linewidth]{figures/csma-3-2-some.png}
    \caption{DT produced by \integration on \emph{csma-3-2} benchmark.}
    \label{fig:csma-3-2-dt}
\end{figure}