\section{Introduction}
\label{sec:intro}
\begin{figure*}[htp]
     \centering
     %\hfill
     \begin{subfigure}[b]{0.99\textwidth}
         \centering
         \includegraphics[width=0.35\textwidth]{img/uai/main/ex_observation.png}
         \includegraphics[width=0.35\textwidth]{img/uai/main/ex_intervention.png}
         %\caption{$y=3sinx$}
     \end{subfigure}
     \caption{\label{fig:real_example} An example time series with predictions of two DeepAR models (top) under an intervention in red (bottom) on the Traffic dataset. While we do not know the ground-truth, we see that two models disagree when faced with an intervention more than on the in-distribution forecasting. Since at most one of them can be right, we conclude that at least the other one makes a notable forecasting error under the intervention.}
\end{figure*}
Forecasting algorithms are increasingly relevant in a variety of applications including meteorology, climatology, economics, and business. While traditional economic modelling relies on relatively simple time series models
\parencite{brockwell1991time}, e.g., autoregressive models, or methods like co-integration, modern business planning heavily uses neural networks for forecasting \parencite{Faloutsos2018,Januschowski2020,Salinas2021}. Despite the advancements of forecast quality, causal implications are not yet well understood. There has been notable progress in `explainable' models in the sense of feature relevance \parencite{Lundberg2017,Molnar2019,janzing2020feature,wang2020} with potential applications in forecasting. Furthermore, specialized models  \parencite{hatt2021sequential,bica20,Lim2018} have shown remarkable success for causal inference in forecasting.

It is common practice in business and econometrics to learn statistical forecasting models and interpret them causally. In practice, while forecasting models tend to agree on their statistical predictions, they can differ substantially on their causal predictions (see Figure~\ref{fig:real_example} for an example). In particular, this practice is considered justified under simplifying assumptions such as causal sufficiency and the absence of contemporaneous effects (see for instance \textcite[Section 1]{hyvarinen2010estimation}).
Here, we are interested in the fundamental question: what is the relation between the statistical predictability of a forecasting model and its causal generalizability --- ability to predict under interventions. 


We argue that even for very simple models and even under simplifying assumptions such as causal sufficiency and absence of contemporaneous influence, causal interpretation of forecasting models is non-trivial.
To appreciate the challenges, consider a simple example of a process with strongly correlated observations where  
 $x_t\approx x_{t-1}$, and hence $x_t\approx x_{t-2}$.
 These observations can be explained either by a causal model with a strong influence of $x_{t-1}$ on $x_t$ or a causal model with a strong influence from $x_{t-2}$ on $x_t$. The difference between the models gets apparent when an intervention randomizes $x_{t-1}$ and $x_{t-2}$ independently. Then, predictions
 become hard, particularly when $x_{t-1}$ 
 and $x_{t-2}$ are set to significantly different values. While both models are similar in their statistical predictions, they differ substantially in their \textit{causal predictions}. This example already shows that, even in a simple setting, causal and statistical predictability can differ significantly. The question of causal generalization is thus practically relevant and non-trivial and begs for a better theoretical understanding.
 
Specifically, we consider the simple class of vector autoregressive models (VAR) and ask the question
\begin{center}
{%
        \textit{How does the efficacy of an autoregressive model in predicting statistical associations compare with its ability to predict under interventions?}
    }%
\end{center}
These models are widely applied in domains ranging from econometrics \parencite{lutkepohl2009econometric, grabowski2020tobit} and finance \parencite{zivot2006vector} to neuroscience \parencite{valdes2005estimating}. 

 
 {\textbf{Connection to Covariate Shift.} The problem of causal generalization is closely related to the problem of covariate shift. To see this, we first ignore the time series setting and consider the scenario where a variable $Y$ should be predicted from a variable $X$, which is known not to be an effect of $Y$. 
If there is no common cause of $X$ and $Y$, that is, we assume causal sufficiency \parencite{Spirtes1993}, the statistical relation
between $X$ and $Y$ is entirely due to the influence of $X$ on $Y$.
Therefore, the observational and interventional conditionals coincide ($P_{Y|x=x^*}=P_{Y|do(x=x^*)}$ in Pearl's language \parencite{pearl2009causality}) and the true parameters would be optimal both from a statistical and causal perspective.
However, due to \textit{estimation bias}, a prediction model learned using finite samples from $P_x$ may perform poorly when randomized interventions draw $x$-values  
from a different distribution $\tilde{P}_X$, which is the usual covariate shift scenario \parencite{Masashi2012}. In our setting, $X$ and $Y$ are represented by the past and the present values of a (possibly multivariate) time series, respectively. Accordingly, we focus on
interventional distributions that are natural for
this setting: independent interventions at different time points and components of the multivariate process. Hence, we have additional structure in comparison with the standard covariate shift problem. We are not aware of any theoretical work on covariate shift in the time-series setting. Nevertheless, we describe the connections to learning theory in the standard covariate shift setting and other related work in Section \ref{sec:related_work}.}

\textbf{Our Contributions.}
Our central goal in this work is to develop a formal and thorough understanding of causal generalization for the class of VAR models.

% \vspace{-3mm}
\begin{enumerate}[leftmargin=0.2cm, label=\alph*.]
\setlength\itemsep{0.1em}
    \item  To this end, we introduce a framework of causal learning theory for forecasting to analyze when forecasting models can generalize from the \textit{observational} to the \textit{interventional distributions} (Section \ref{sec:clt}). This is closely related to the setting of learning under domain adaptation.
    %
    \item Using this framework, we provide a characterization of the difference in the statistical and \textit{causal} risks (Section \ref{sec:results}). Such a characterization allows us to identify the sources of divergence between the two quantities. Our results show that the strength of correlation of the underlying process plays a key role in determining causal generalizability. They also highlight that already for simple models, causal and statistical errors can even diverge.
    %
    \item Further, we provide finite-sample, uniform convergence bounds on causal generalization for the class of VAR models (Section \ref{sec:results}). Our simulations demonstrate that our bounds indeed capture the key drivers of causal generalization. To the best of our knowledge, this is the first work that provides theoretical guarantees for causal generalization of any kind in the time-series setting.
    %
    \item As a by-product of our analysis, we provide an explicit characterization of the powers of a companion matrix (see Section \ref{sec:clt}) using symmetric Schur polynomials \parencite{macdonald1998symmetric} of its eigenvalues (Lemma \ref{lemma:coef_as_schur}) which, 
    to the best of our knowledge, has not been noted in the literature. This result could be of independent interest in theoretical endeavors that build upon companion matrices which, for instance, are ubiquitous in stochastic processes and in Linear-Time-Invariant dynamical systems \parencite{davison1976robust, melnyk2016estimating}.
    %
    \item We conduct experiments with a variety of deep neural networks on real data. Our experiments approach causal risks in this setting and explore its relationship to uncertainty.
\end{enumerate}
