\section{Simulations}
\label{sec:experiments}
\begin{figure*}[!htb]
    \centering
    %\includegraphics[width=0.4\textwidth]{img/uai/main/error_stat_vs_causal_10000_100_1_0_order5.png} Only contains one panel with order 5
    \includegraphics[width=0.80\textwidth]{img/uai/main/error_stat_vs_causal_10000_100_1_0_all.png}
    \caption{The causal error $\mathcal{G}$ versus the statistical error $\mathcal{S}$ for AR($p$) processes with $p=3, 5, 7$.}
    \label{fig:error}
\end{figure*}

\begin{figure*}[!htb]
    \centering
    \includegraphics[width=0.80\textwidth]{img/uai/main/correlation_vs_error_10000_100_1_0_ols_new.png}
    %\includegraphics[width=0.4\textwidth]{img/uai/main/correlation_vs_error_10000_100_1_0_ols5.png}  This one only contains one panel with order 5
    \includegraphics[width=0.80\textwidth]{img/uai/main/correlation_vs_error_10000_100_1_0_ridge_new.png}
    %\includegraphics[width=0.4\textwidth]{img/uai/main/correlation_vs_error_10000_100_1_0_ridge5.png}  This one only contains one panel with order 5
    \caption{The maximal difference between statistical error $\mathcal{S}$ and causal error $\mathcal{G}$ as well as an estimate for the generalization bound in Theorem \ref{thm:main} for increasing condition number $\kappa$ for process orders $p=3,5,7$ (from left to right). The maximum is taken over 500 datasets with the closest $\kappa$. Our theoretical bounds (orange) closely match the empirical evaluations up to constant factors (blue).}
    \label{fig:corr_vs_err}
\end{figure*}

%
To verify the practical behavior of causal and statistical risks, we provide some simple simulations to study the errors of different estimators under AR processes.
For each presented plot, we draw parameters for 10,000 stationary $\mathit{AR}(p)$ processes using rejection sampling. We draw the coefficients of each process independently and uniformly from $[-2, 2]$ and reject sets of parameters that yield a non-stationary process. 
For each process, we draw a training sample with 100 timesteps and a test sample with 1000 timesteps.
For all figures in the main paper we set $\omega=1$.
To estimate the coefficients we use Ordinary Least Squares (OLS). 
In Appendix \ref{sec:additional_simulations} we provide additional plots with hidden confounder, as well as varying order, sample size, $\omega$ and other estimators: Ridge, Lasso, and Elastic Net regressors.
OLS minimizes the empirical statistical error, that is, $\sum_{y_i, \hat{y}_i} (y_i-\hat{y}_i)^2 $, where $\hat{y}_i$ denotes the model prediction with estimated parameters $\hat{a}$.
%

In line with our theoretical results, we find that even for simple scalar AR processes of small orders, the causal error of the estimators is often several times larger than the statistical error (see Figure \ref{fig:error}).
In Figure \ref{fig:corr_vs_err} we sorted the randomly drawn datasets by their autocorrelation (measured by the condition number $\kappa$ of the autocorrelation matrix) and split the sorted list into buckets of 500 dataset. For each we calculated the maximum, mean and 90\% quantile of the difference in causal and statistical error for the OLS and Ridge estimators. The plots corresponding to the other estimators are provided in Appendix \ref{sec:additional_simulations} We can see that upto constant factors, our theoretical finite sample causal generalization bound matches the difference in causal and statistical risks observed empirically.

