\section{Introduction}
Spatio--temporal models are widely used by practitioners. Explaining economic, environmental, social, or biological phenomena, such as peer influence, neighbourhood effects, contagion, epidemics, interdependent preferences, climate change, and so on, are only some of the interesting applications of such models. A widely used spatio--temporal model is the spatial dynamic panel data model (SDPD) proposed and analysed by \cite{LeeYu10a}. See \cite{LeeYu10b} for a survey. To improve adaptivity of SDPD models, \cite{DouAlt15} recently proposed a generalized model that assigns different coefficients to varied locations and assumes heteroskedastic and spatially correlated errors. The model is
\begin{equation}\label{eqn1}
{\mathbf y}_t = D(\boldsymbol{\lambda}_0){\mathbf W}{\mathbf y}_t + D({\boldsymbol{\lambda}_1}){\mathbf y}_{t-1} + D(\boldsymbol{\lambda}_2){\mathbf W}{\mathbf y}_{t-1} + \mbox{\boldmath$\varepsilon$}_t,
\end{equation}
where the vector ${\mathbf y}_t$ is of order $p$ and contains the observations at time $t$ from $p$ different locations; the errors $\mbox{\boldmath$\varepsilon$}_t$ are serially uncorrelated; the \emph{spatial matrix} ${\mathbf W}$ is a weight matrix with zero main diagonal and is assumed to be known; $D(\boldsymbol{\lambda}_j)$ with $j=0,1,2$ are diagonal matrices, and $\boldsymbol{\lambda}_j$ are the vectors with the spatial coefficients $\lambda_{ji}$ for $i=1,\ldots,p$. The \emph{generalized SDPD} model in (\ref{eqn1}) guarantees adaptivity by means of its $3p$ parameters. It is characterized by the sum of three terms: the \emph{spatial component}, driven by matrix ${\mathbf W}$ and the spatial parameter $\boldsymbol{\lambda}_0$; the \emph{dynamic component}, driven by the autoregressive parameter $\boldsymbol{\lambda}_1$; and the \emph{spatial--dynamic component}, driven by matrix ${\mathbf W}$ and the spatial--autoregressive parameter $\boldsymbol{\lambda}_2$. If the vectors $\boldsymbol{\lambda}_j$ are scalars for all $j$, then model (\ref{eqn1}) reduces to the classic SDPD of \cite{LeeYu10a}.

The errors $\mbox{\boldmath$\varepsilon$}_t$ in model (\ref{eqn1}) are serially uncorrelated and may show heteroskedasticity and cross-correlation over space, so that $\mathop{var}(\mbox{\boldmath$\varepsilon$}_t)$ is a full matrix. This is a novelty compared with the \emph{SDPD} model of \cite{LeeYu10a}, where the errors must be cross-uncorrelated and homoskedastic in order to get consistency of the estimators. A setup similar to ours for the errors has been also considered by \cite{KelPru10} and \cite{Su12}, but not for panel models. However, their estimators are generally based on the instrumental variables technique, in order to overcome the endogeneity of the \emph{zero-lag} regressor. For the \emph{generalized SDPD} model, instead, \cite{DouAlt15} propose a new estimation procedure based on a generalized Yule--Walker approach. They show the consistency of the estimators under regularity assumptions. They also derive the convergence rate and the conditions under which the estimation procedure does not suffer for high-dimensional setups, notwithstanding the large number of parameters to be estimated (which become infinite with the dimension $p$).

In real data applications, it is important to check the validity of the assumptions required for the consistency of the estimation procedure. See, for example, the assumptions and asymptotic analysis in \cite{LeeYu10a} and \cite{DouAlt15} as well as the references therein. Checking such assumptions on real data is often not easy; at times, they are clearly violated.
Moreover, the spatial matrix ${\mathbf W}$ is assumed to be known, but in many cases, this is not true, and it must be estimated. For example, the spatial weights can be associated with ``similarities'' between spatial units and measured by estimated correlations. Another example is when the spatial weights are zeroes/ones, depending on the ``relationships'' between the spatial units, but the neighbourhood structure of ${\mathbf W}$ is unknown (\emph{i.e.}, it is not known where the ones must be allocated). In such cases, the performance of the \emph{SDPD} models needs to be investigated. Readers are advised to refer to recent papers on spatial matrix estimation (see, among others, \cite{LamSou16}).

Motivated by the above considerations, we propose a new version of the \emph{ SDPD} model obtained by adding a constraint to the spatial parameters of the \emph{generalized SDPD} of \cite{DouAlt15}. New estimators of the parameters are proposed and investigated theoretically and empirically.

The new model is called \emph{stationary SDPD} and has several advantages.
First, the structure of the model and the interpretation of the parameters are simpler than the \emph{generalized SDPD} model, with the consequence that the assumptions underlying the theoretical results are clearer and can be checked easily with real data. Moreover, the estimation procedure is fast and simple to implement.

Second, the proposed estimators of the parameters are always unbiased and reach the $\sqrt{T}$ convergence rate (where $T$ is the temporal length of the time series) even in the high-dimensional case, although the number of parameters tends to infinity with the dimension $p$.

Last, but not least, our model allows wider application than the classic \emph{SDPD} model, and it is general enough to represent a wide range of multivariate linear processes that can be implicitly interpreted (when they are not explicitly interpretable) as spatio--temporal processes, with respect to a ``latent spatial matrix,'' which needs to be estimated. A big implication of this is that our model is not necessarily confined to the representation of strict spatio--temporal processes (where the spatial matrix is known), but it can also be considered as a valid alternative to the general \emph{VAR} models (where there is no spatial matrix), with two relevant advantages: i) more efficient estimation of the model and ii) the possibility of estimating the model even when $p>T$, thus avoiding the \emph{curse of dimensionality} that characterizes the \emph{VAR} models. Surprisingly, the simulation results show the remarkably better performance of our model and the new estimation procedure compared with the standard VAR model and the standard estimation procedure, even when the spatial matrix is latent and, therefore, to be estimated (see section \ref{matrixA}).

The rest of the paper is organized as follows. Section \ref{sdpd} presents the new model and discusses the issue of identifiability. Section \ref{est_alg} describes the estimation procedure. The theoretical results are shown in section \ref{asymptotic}. The empirical performance of the estimation procedure is investigated in section \ref{simulazioni}. Finally, all the proofs are provided in the Appendix.


\section{A constrained spatio--temporal model: the stationary SDPD}\label{sdpd}
In the sequel, we assume that ${\mathbf y}_1, \cdots, {\mathbf y}_T$ are the observations from a stationary process defined by (\ref{eqn3}). The transpose of a matrix ${\mathbf A}$ is denoted with ${\mathbf A}^T$.
We assume that the process has mean zero and denote with $\boldsymbol{\Sigma}_j=\mathop{cov}({\mathbf y}_t,{\mathbf y}_{t-j})=E({\mathbf y}_t{\mathbf y}_{t-j}^T)$ the covariance matrix of the process at the lag $j$.
The \emph{generalized SDPD} model in (\ref{eqn1}) can be reformulated as follows.
\begin{equation}
\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]{\mathbf y}_t = D({\boldsymbol{\lambda}_1})\left[{\mathbf I}_p - D(\boldsymbol{\lambda}^+_2){\mathbf W}\right]{\mathbf y}_{t-1} + \mbox{\boldmath$\varepsilon$}_t,  \label{eqn3}
\end{equation}
where $\boldsymbol{\lambda}^+_2$  is a vector obtained by dividing the elements of $\boldsymbol{\lambda}_2$ by the corresponding elements of $\boldsymbol{\lambda}_1$ (assuming, for now, that all the coefficients in $\boldsymbol{\lambda}_1$ are different from zero). Note that model (\ref{eqn3}) is equivalent to a multivariate (auto)regression between a linear combination of ${\mathbf y}_t$ and a linear combination of the lag ${\mathbf y}_{t-1}$, where the weights of the two linear combinations depend on ${\mathbf W}$ and the coefficients $\boldsymbol{\lambda}_0$ and $\boldsymbol{\lambda}^+_2$, respectively.
\begin{equation}\label{zeta}
{\mathbf z}_t^{(\boldsymbol{\lambda}_0,{\mathbf W})} = D({\boldsymbol{\lambda}_1}){\mathbf z}_{t-1}^{(\boldsymbol{\lambda}^+_2,{\mathbf W})} + \mbox{\boldmath$\varepsilon$}_t.
\end{equation}

Some special cases may arise from model (\ref{eqn3}) by adding restrictions on the parameters $\boldsymbol{\lambda}_{j}$.
First, if we assume that the spatial parameters are constant over space, that is, $\boldsymbol{\lambda}_{j}$ is scalar for $j=0,1,2$, then we obtain the classic \emph{SDPD} model of \cite{LeeYu10a}.




Another constrained model, proposed and analysed in this paper, may be derived by assuming that $\boldsymbol{\lambda}_0=\boldsymbol{\lambda}^+_2$.
The reason underlying the choice of this constraint is a generalized assumption of stationarity. In time series analysis, stationarity means that the dependence structure of the process is constant (in some sense) over time. In particular, second-order stationarity assumes that correlations between the observations $({\mathbf y}_t,{\mathbf y}_{t-j})$ depend on the lag $j$ but not on $t$, implying that $\mathop{var}({\mathbf y}_t)$ is constant for all $t$. However, in spatio--temporal time series, there are two kinds of correlations: \emph{temporal correlations}, involving observations at different time points, and \emph{spatial correlations}, involving observations at different spatial units. As we refer to stationarity, it makes sense to assume that spatial correlations are also time-invariant, which means that the weights in (\ref{zeta}) must not change over time, thus, $\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]=\left[{\mathbf I}_p-D(\boldsymbol{\lambda}^+_2){\mathbf W}\right]$, also implying that $\mathop{var}({\mathbf z}_t)$ is the same for all $t$. Therefore, we add the constraint $\boldsymbol{\lambda}_0=\boldsymbol{\lambda}^+_2$, and the model becomes
\begin{equation} \label{b1}
\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]{\mathbf y}_t = D(\boldsymbol{\lambda}_1)\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]{\mathbf y}_{t-1} +  \mbox{\boldmath$\varepsilon$}_t.
\end{equation}

We denote the model as \emph{stationary SDPD}.
Model (\ref{b1}) has several advantages that will be shown in the following sections.  Above all, imposing spatio--temporal stationarity helps gain efficiency while still preserving the spatial adaptability that characterizes the \emph{generalized SDPD} model of \cite{DouAlt15}. Moreover, our model allows representation of a wide range of multivariate processes by means of a simple model subject to few assumptions that can be easily checked using real data.

Finally, it is worthwhile to stress the difference between the \emph{SDPD} model of \cite{LeeYu10a}, the \emph{generalized SDPD} model of \cite{DouAlt15}, and the \emph{stationary SDPD} model proposed here. The first model imposes that the spatial relationships be the same for all units, since the coefficients $\lambda_j$ (with $j=0,1,2$) do not change with $i=1,\ldots,p$. Instead, the \emph{stationary SDPD} model in (\ref{b1}) allows varied coefficients for different spatial units, as in the \emph{generalized SDPD} of \cite{DouAlt15}, but they are assumed to be time-invariant thanks to a constraint on the time-lagged parameters. Of course, the estimation procedures vary for the three cases in terms of the convergence rates. The constrained model underlying our \emph{stationary SDPD} allows the estimation procedure to reach the $\sqrt{T}$ convergence rate and to guarantee unbiased estimators, whatever the dimension $p$ and even when $p\rightarrow\infty$ at any rate. This is a big improvement with respect to the other two models. In fact, for the classic \emph{SDPD} model, the estimators are characterized by a $\sqrt{Tp}$ convergence rate (which is faster than that of our model, since they have only three parameters to estimate instead of $2p$), but a bias of order $T^{-1}$ exists, and it does not vanish when $p/T\rightarrow\infty$ (see Theorem 3 of \cite{LeeYu10a}). On the other hand, the convergence rates of the estimators in the \emph{generalized SDPD} model are slower than those of our model and deteriorate when $p\rightarrow\infty$ at a rate faster than $\sqrt{T}$ (see Theorem 2 of \cite{DouAlt15}).


\subsection{Identification of parameters in the case of cross-uncorrelated errors}

In this section, we assume, for simplicity, that the matrix $\boldsymbol{\Sigma}_0^{\varepsilon}=\mathop{var}(\mbox{\boldmath$\varepsilon$}_t)$ is diagonal (\emph{i.e.}, there is heteroskedasticity  but no cross-correlation in the error process) and discuss the identifiability of the model. In the next section, we generalize the problem by also adding some cross-correlations in the error process.

Defining ${\mathbf z}_t^{(0)}=\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]{\mathbf y}_t$, model (\ref{b1}) can be reformulated as
\begin{eqnarray}
{\mathbf z}_t^{(0)} &=& D(\boldsymbol{\lambda}_1){\mathbf z}_{t-1}^{(0)} + \mbox{\boldmath$\varepsilon$}_t, \label{b1ter}
\end{eqnarray}
which is a transformed \emph{VAR} process with uncorrelated components, since $D(\boldsymbol{\lambda}_1)$ is diagonal.
Given that we assume that $\boldsymbol{\Sigma}_0^{\varepsilon}$ is also diagonal, the coefficients $\lambda_{1i}$ for $i=1,\ldots,p$, represent the slopes of $p$ univariate autoregressive models with respect to the latent variables $z_{it}^{(0)}$. Therefore, $\lambda_{1i}\equiv\mathop{cor}(z_{it}^{(0)}, z_{i,t-1}^{(0)})$.
From (\ref{b1}), it follows that
\begin{equation}
\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]\boldsymbol{\Sigma}_1 = D(\boldsymbol{\lambda}_1)\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]\boldsymbol{\Sigma}_0 \label{seconda}\\
\end{equation}
and for the $i$-th equation,
\begin{equation}\label{vincolo}
({\mathbf e}^T_i-\lambda_{0i}{\mathbf w}^T_i)\boldsymbol{\Sigma}_{1}=\lambda_{1i}({\mathbf e}^T_i-\lambda_{0i}{\mathbf w}^T_i)\boldsymbol{\Sigma}_0,
\end{equation}
where ${\mathbf e}_i$ is the column vector with its $i$-th component equal to one and all the others equal to zero, while ${\mathbf w}_i$ is the column vector containing the $i$-th row of matrix ${\mathbf W}$.

Under general assumptions, (\ref{vincolo}) admits only one solution with respect to $\lambda_{0i}$ and $\lambda_{1i}$ (see Theorem \ref{theorem1}), which can be found among the extreme points of $\lambda_{1i}=\mathop{cor}(z_{it}^{(0)}, z_{i,t-1}^{(0)})$ as a function of $\lambda_{0i}$.
To provide insight into this, the first two plots of figure \ref{figure2} show two examples based on model 1 used in the simulation study. Denote with ($\lambda_{0i}^*,\lambda_{1i}^*)$ the true values of the coefficients used in model 1 for a given location $i$ (in particular, in figure \ref{figure2}, the first two plots refer to locations $i=6$ and $i=8$). The solid line shows $\lambda_{1i}=\mathop{cor}(z_{it}^{(0)}, z_{i,t-1}^{(0)})$ as a function of $\lambda_{0i}$. The two dots show the points of this function where the first derivative is zero. The vertical and horizontal dashed lines identify which one of the two points satisfies the sufficient condition in (\ref{vincolo}). As expected, it coincides with the true values ($\lambda_{0i}^*,\lambda_{1i}^*)$ used to generate the time series.
Theorem \ref{theorem1}, shown in the Appendix, formalizes this result.
\begin{theorem}\label{theorem1}
Consider model (\ref{b1}) for a stationary process ${\mathbf y}_t$ with mean zero, and assume that the error process $\mbox{\boldmath$\varepsilon$}_t$ is such that $\boldsymbol{\Sigma}^0_\varepsilon=\mathop{var}(\mbox{\boldmath$\varepsilon$}_t)$ is diagonal (\emph{i.e.}, there is heteroskedasticity but no cross-correlation in the errors). Under assumptions $A1-A4$ in section \ref{asymptotic}, the following results hold:
\begin{enumerate}
\item There exist a unique couple of values $(\lambda_{0i}^*,\lambda_{1i}^*)$ satisfying the following system of equations:
\begin{equation}\label{vinc}
({\mathbf e}^T_i-\lambda_{0i}{\mathbf w}^T_i)\boldsymbol{\Sigma}_{1}-\lambda_{1i}({\mathbf e}^T_i-\lambda_{0i}{\mathbf w}^T_i)\boldsymbol{\Sigma}_0={\bf 0}^T, \qquad\qquad i=1,\ldots,p,
\end{equation}
where ${\mathbf e}_i$ is the $i$-th unit vector, and ${\mathbf w}_i$ contains the $i$-th row of the spatial matrix ${\mathbf W}$.
\item Such a point, $(\lambda_{0i}^*,\lambda_{1i}^*)$, is also the solution of the following second-order polynomial equation:
\begin{eqnarray} \label{nec_cond}
\left.\frac{\partial\mathop{cov}(z_{it}^{(0)}, z_{i,t-1}^{(0)})}{\partial\lambda_{0i}}\right|_{\lambda_{0i}=\lambda_{0i}^*}- \lambda_{1i}^*\left.\frac{\partial\mathop{var}(z_{i,t-1}^{(0)})}{\partial\lambda_{0i}}\right|_{\lambda_{0i}=\lambda_{0i}^*} &=& 0.
\end{eqnarray}
\end{enumerate}
\end{theorem}

\noindent\textbf{Remark 1:} Theorem \ref{theorem1} not only shows that the \emph{stationary SDPD} model is well identified, because there is a unique solution for $(\boldsymbol{\lambda}_0,\boldsymbol{\lambda}_1)$, but it also suggests a way to estimate such parameters. In fact, we can find all the solutions to equation (\ref{nec_cond}) and then check which one satisfies the sufficient condition in (\ref{vincolo}). This estimation procedure is described in section \ref{est_alg}.


\subsection{Identification of parameters in the case of cross-correlated errors}

Now, we relax the assumption on the error $\mbox{\boldmath$\varepsilon$}_t$ by letting $\boldsymbol{\Sigma}_0^\varepsilon$ be a full matrix (i.e., there is heteroskedasticity and cross-correlation in the error process). This setup allows the process ${\mathbf y}_t$ to include some {spurious cross-correlation} not explained by ${\mathbf W}$. In this case, the coefficients $\lambda_{i1}$ still give the correlations between the latent variables $z_{i,t}^{(0)}$ and $z_{i,t-1}^{(0)}$, but now, the $p$ equations in model (\ref{b1ter}) are correlated. The main consequence of this is that the true values $(\lambda_{0i}^*, \lambda_{1i}^*)$ do not identify an extreme point of the correlation function (see case $i=2$ in figure \ref{figure2}). Anyway, the sufficient condition in (\ref{vincolo}) is still valid, and the true coefficients $(\lambda_{0i}^*, \lambda_{1i}^*)$ can be identified by introducing a ``constrained'' condition.

\begin{theorem}\label{theorem1bis}
Consider model (\ref{b1}) for a stationary process ${\mathbf y}_t$ with mean zero, and assume that the error process $\mbox{\boldmath$\varepsilon$}_t$ is such that $\boldsymbol{\Sigma}^0_\varepsilon=\mathop{var}(\mbox{\boldmath$\varepsilon$}_t)$ is a full matrix (\emph{i.e.}, there is heteroskedasticity and cross-correlation in the errors). Under assumptions $A1-A4$ in section \ref{asymptotic}, the following results hold:
\begin{enumerate}
\item There exist a unique couple of values $(\lambda_{0i}^*,\lambda_{1i}^*)$ satisfying the following system of equations
\[
({\mathbf e}^T_i-\lambda_{0i}{\mathbf w}^T_i)\boldsymbol{\Sigma}_{1}-\lambda_{1i}({\mathbf e}^T_i-\lambda_{0i}{\mathbf w}^T_i)\boldsymbol{\Sigma}_0={\bf 0}^T, \qquad\qquad i=1,\ldots,p,
\]
where ${\mathbf e}_i$ is the $i$-th unit vector, and ${\mathbf w}_i$ contains the $i$-th row of the spatial matrix ${\mathbf W}$;
\item such a point, $(\lambda_{0i}^*,\lambda_{1i}^*)$, is also the solution of the following second-order polynomial equation.
\begin{eqnarray*}
\left.\frac{\partial\mathop{cov}(z_{it}^{(0)}, z_{i,t-1}^{(0)})}{\partial\lambda_{0i}}\right|_{\lambda_{0i}=\lambda_{0i}^*}- \lambda_{1i}^*\left.\frac{\partial\mathop{var}(z_{i,t-1}^{(0)})}{\partial\lambda_{0i}}\right|_{\lambda_{0i}=\lambda_{0i}^*} &=& ({\mathbf e}_i^T-\lambda_{0i}^*{\mathbf w}^T_i)(\boldsymbol{\Sigma}_1^T-\boldsymbol{\Sigma}_1){\mathbf w}_i.
\end{eqnarray*}
\end{enumerate}
\end{theorem}

\noindent\textbf{Remark 2:} When the errors $\mbox{\boldmath$\varepsilon$}_t$ are not cross-correlated, the matrix $\boldsymbol{\Sigma}_1$ is symmetric by Lemma \ref{lemma1} in the Appendix, so that point 2 in Theorem \ref{theorem1bis} becomes the same as in Theorem \ref{theorem1}. Therefore, Theorem \ref{theorem1bis} includes Theorem \ref{theorem1} as a special case.



\section{Estimation procedure}\label{est_alg}
We present here a simple algorithm for the estimation of the parameters $(\lambda_{0i},\lambda_{1i})$ for $i=1,\ldots,p$. First, estimate the matrices $\boldsymbol{\Sigma}_1$ and $\boldsymbol{\Sigma}_0$ through some consistent estimators $\hat\boldsymbol{\Sigma}_0$ and $\hat\boldsymbol{\Sigma}_1$. For example, $\hat\boldsymbol{\Sigma}_0=(n-1)^{-1}{\mathbf Y}_0{\mathbf Y}_0^T$ and $\hat\boldsymbol{\Sigma}_1=(n-2)^{-1}{\mathbf Y}_0{\mathbf Y}_1^T$, where ${\mathbf Y}_l=({\mathbf y}_{1+l}, \cdots, {\mathbf y}_{n-l})$. Alternatively, the threshold estimator analyzed in \cite{CheAlt13} can be considered in the high dimensional setup. Then, for each location $i=1,\ldots,p$, implement the following steps.
\begin{enumerate}
\item Define ${\mathbf e}_i$ as the $i$-th unit vector and ${\mathbf w}^T_i={\mathbf e}^T_i{\mathbf W}$, then compute:
\begin{eqnarray*}
\hat a_{0i} &=& {\mathbf e}_i^T\hat\boldsymbol{\Sigma}_0{\mathbf e}_i, \quad \hat a_{1i} = {\mathbf e}_i^T\hat\boldsymbol{\Sigma}_1 {\mathbf e}_i, \quad
\hat a_{2i} = {\mathbf e}_i^T(\hat\boldsymbol{\Sigma}_1^T-\hat\boldsymbol{\Sigma}_1){\mathbf w}_i, \\
\hat b_{0i} &=& -2{\mathbf e}_i^T\hat\boldsymbol{\Sigma}_0{\mathbf w}_i, \quad \hat b_{1i} =  -{\mathbf e}^T_i(\hat\boldsymbol{\Sigma}_1+\hat\boldsymbol{\Sigma}_1^T){\mathbf w}_i, \\
 \hat c_{0i} &=& {\mathbf w}^T_i\hat\boldsymbol{\Sigma}_0{\mathbf w}_i,\quad \hat c_{1i} = {\mathbf w}^T_i\hat\boldsymbol{\Sigma}_1{\mathbf w}_i.
\end{eqnarray*}
\item Find the two roots $\lambda_{0i}^{(1)}$ and $\lambda_{0i}^{(2)}$ of the following two-order polynomial equation.
\begin{equation}\label{eqq}
\hat t_{0i} + \hat t_{1i}\lambda_{i0} + \hat t_{2i}\lambda_{i0}^2 =0,
\end{equation}
where $\hat t_{0i} = \hat b_{1i}\hat a_{0i}-\hat b_{0i}\hat a_{1i}+\hat a_{0i}\hat a_{2i}$, $\hat t_{1i} = 2(\hat a_{0i}\hat c_{1i}-\hat c_{0i}\hat a_{1i})+\hat a_{2i}\hat b_{0i}$, and $\hat t_{2i} = \hat c_{1i}\hat b_{0i}-\hat c_{0i}\hat b_{1i}+\hat a_{2i}\hat c_{0i}$.
\item Estimate $\lambda_{0i}$ and $\lambda_{1i}$ by
\begin{eqnarray}\label{stimatore0}
\hat\lambda_{0i}&=&\arg\min_{j=1,2}{\mathbf v}_{ij}^T{\mathbf v}_{ij}, \\
\hat\lambda_{1i}&=&\frac{({\mathbf e}_i^T-\hat\lambda_{0i}{\mathbf w}^T_i)\hat\boldsymbol{\Sigma}_1({\mathbf e}_i-\hat\lambda_{0i}{\mathbf w}_i)}{({\mathbf e}_i^T-\hat\lambda_{0i}{\mathbf w}^T_i)\hat\boldsymbol{\Sigma}_0({\mathbf e}_i-\hat\lambda_{0i}{\mathbf w}_i)}, \label{stimatore1}
\end{eqnarray}
where
$
{\mathbf v}^T_{ij} = ({\mathbf e}_i^T-\lambda_{0i}^{(j)}{\mathbf w}^T_i)\hat\boldsymbol{\Sigma}_1-\lambda_{1i}^{(j)}({\mathbf e}_i^T-\lambda_{0i}^{(j)}{\mathbf w}^T_i)\hat\boldsymbol{\Sigma}_0,
$,
and $\lambda_{1i}^{(j)} = ({\mathbf e}_i^T-\lambda_{0i}^{(j)}{\mathbf w}^T_i)\hat\boldsymbol{\Sigma}_1({\mathbf e}_i-\lambda_{0i}^{(j)}{\mathbf w}_i)/({\mathbf e}_i^T-\lambda_{0i}^{(j)}{\mathbf w}^T_i)\hat\boldsymbol{\Sigma}_0({\mathbf e}_i-\lambda_{0i}^{(j)}{\mathbf w}_i)$.
\end{enumerate}



\vspace{10pt}\noindent\textbf{Remark 3:} Assumption $A2$ in section \ref{asymptotic} guarantees that matrix $\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]$ has full rank. However, the above estimation procedure may suffer for some locations if matrix $\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]$ is near singularity. Such a case may come about because of the presence of some almost linearly dependent rows in the matrix, which may cause the quantity ${\mathbf w}_i^T[\boldsymbol{\Sigma}_1-\lambda_{1i}\boldsymbol{\Sigma}_0]{\mathbf w}_i$ to be almost zero for those rows (see Lemma \ref{lemma2}). As a result, the procedure loose efficiency for the estimation of $\lambda_{i0}$ for those locations (but it still works for $\lambda_{1i}$). Something similar may happen if there are some (almost) zero rows in ${\mathbf W}$, which is excluded by assumption $A4$. Anyway, it is worthwhile to stress that the estimation procedure works efficiently for all the other locations. In fact, the procedure does not require the inversion of matrix $\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]$, so it is able to isolate and separate the effects of ``collinear'' locations (or uncorrelated locations) from the other locations and to guarantee consistent and efficient estimations for the last locations.



\section{Theoretical results}\label{asymptotic}

In this section, we show the theoretical foundations of our proposal. In particular, we present the assumptions and show the consistency and the asymptotic normality of the estimators, for the cases of finite dimension and high dimension. Moreover, we show that the \emph{stationary SDPD} model can be used to represent a wide range of multivariate linear processes with respect to a ``latent spatial matrix,'' and therefore, it is of wider application than classic spatio--temporal contexts.

The reduced form of model (\ref{b1}) is
\begin{equation}\label{b1bis}
{\mathbf y}_t={\mathbf A}^*{\mathbf y}_{t-1}+\mbox{\boldmath$\varepsilon$}_t^*,
\end{equation}
where $\mbox{\boldmath$\varepsilon$}_t^*=\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]^{-1}\mbox{\boldmath$\varepsilon$}_t$ and
\begin{equation}\label{diagonalize}
{\mathbf A}^*=\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]^{-1}D(\boldsymbol{\lambda}_1)\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right].
\end{equation}
Note that the errors $\mbox{\boldmath$\varepsilon$}_t^*$ have mean zero and are serially uncorrelated. Model (\ref{b1bis}) has a \emph{VAR} representation, so it is stationary when all the eigenvalues of matrix ${\mathbf A}^*$ are smaller than one in absolute value. From (\ref{diagonalize}), we can note that $\boldsymbol{\lambda}_1$ contains the eigenvalues of ${\mathbf A}^*$ whereas $\boldsymbol{\lambda}_0$ only affects its eigenvectors (see the proof of Theorem \ref{theorem1}). Therefore, we must consider the following assumptions:
\begin{itemize}
\item[A1)] $\lambda_{1i}\in\mathbb{R}$ and $|\lambda_{1i}|<1$, for all $i$, and vector $\boldsymbol{\lambda}_1$ is not scalar;
\item[A2)] $\lambda_{0i}\in\mathbb{R}$ for all $i$ and vector $\boldsymbol{\lambda}_0$ is such that matrix $\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]$ has full rank;
\item[A3)] the errors $\varepsilon_{it}$ are serially independent and such that $E(\varepsilon_{it})=0$ and $E|\varepsilon_{it}|^\delta<\infty$ for all $i,t$, for some $\delta>4$;
\item[A4)] the spatial matrix ${\mathbf W}$ is nonsingular and has zero main diagonal.
\end{itemize}

Assumption $A1$ implies stationarity. Moreover, it guarantees that there are at least two distinct values in vector $\boldsymbol{\lambda}_1$ so that model (\ref{b1}) is identifiable, as shown in Theorem \ref{theorem1}. Assumption $A2$ is clearly necessary to assure that matrix $\left[{\mathbf I}_p-D(\boldsymbol{\lambda}_0){\mathbf W}\right]$ can be inverted so that the reduced model in (\ref{b1bis}) is well defined (Remark 3 indicates what happens when this assumption is not satisfied). Incidentally, it is worthwhile to note that our setup automatically solves the problem concerning the parameter space of $\boldsymbol{\lambda}_0$, highlighted at the end of section 2.2 by \cite{KelPru10}. So, in the empirical applications of our model, it is possible to use any kind of normalization for ${\mathbf W}$ (\emph{i.e.}, row-factor normalization or single-factor normalization), since the vector $\boldsymbol{\lambda}_0$ would automatically rescale accordingly (see section \ref{high} for more details on this aspect). This means that the coefficients $\lambda_{0i}$ can also take values outside the classic interval $[-1,1]$.
Assumption $A3$ assures that the results in \cite{Han76} can be applied to show the asymptotic normality of the estimators.
Assumption $A4$ is classic in spatio--temporal models and guarantees that the model is well defined and identifiable with respect to all the parameters, also for $p\rightarrow\infty$ (see Lemma \ref{lemma2} and Theorem \ref{theorem4}).

Under assumptions $A1-A4$, it is immediately evident that the estimators $\hat\boldsymbol{\lambda}_{0}$ and $\hat\boldsymbol{\lambda}_{1}$, presented in section \ref{est_alg}, are both consistent following Theorem 11.2.1 in \cite{BroDav86}. For asymptotic normality, the following theorem can be stated.
\begin{theorem}\label{theorem3}
Consider $\hat\lambda_{0i}$ and $\hat\lambda_{1i}$, the estimators obtained by the algorithm in section \ref{est_alg}. Under assumptions $A1-A4$, we have for finite $p$
\begin{equation}
\sqrt{T}(\hat\lambda_{ji}-\lambda_{ji}) \stackrel{d}{\longrightarrow}N(0,{\mathbf D}_{ji}^T{\mathbf V}_{ji}{\mathbf D}_{ji}) \nonumber\qquad\qquad  j=0,1;\quad i=1,\ldots,p,
\end{equation}
where ${\mathbf D}_{ji}$ are the $K_i\times 1$ vectors, and ${\mathbf V}_{ji}$ are the matrices of order $K_i$ with $K_i \le 2p^2$ (see the proof).
\end{theorem}

Note that the estimators $\widehat\lambda_{ji}$ are unbiased for all $i,j$ and for all $p$. 

In the high dimension, we have infinite parameters to estimate ($2p$ in total, where $p\rightarrow\infty$). Therefore, we must assure that the consistency of the estimators is still valid in such a case. As expected, the properties of matrix ${\mathbf W}$ influence the consistency and the convergence rates of the estimators $\hat\lambda_{ij}$ when $p\rightarrow\infty$. For example, denote with $k_i$ the number of nonzero elements in vector ${\mathbf w}_i$. If $k_i=O(1)$ as $p\rightarrow\infty$, for all $i$, then the effective dimension of model (\ref{b1}) is finite and Theorem \ref{theorem3} can still be applied for the consistency and the asymptotic normality of the estimators $\hat\lambda_{ji}$, even if $p\rightarrow\infty$. The following Theorem \ref{theorem4}, instead, shows the consistency of the estimators under more general vectors ${\mathbf w}_i$, with $k_i\rightarrow\infty$ as $p\rightarrow\infty$. 

\subsection{Asymptotics for high dimensional setups}\label{high}
In model (\ref{b1}), the spatial correlation between a given location $i$-th and the other locations is summarized by $\lambda_{0i}{\mathbf w}_i$. If the vector ${\mathbf w}_i$ is rescaled by a factor $\delta_i$, then we can have an equivalent model by rescaling the spatial coefficient $\lambda_{0i}$ by the inverse of the same factor, since $\lambda_{0i}{\mathbf w}_i=\delta_i^{-1}\lambda_{0i}{\mathbf w}_i\delta_i=\lambda_{0i,\delta}{\mathbf w}_{i,\delta}$. In such a way, we may consider irrelevant a row-normalization of matrix ${\mathbf W}$ if we let the coefficients in  $D(\boldsymbol{\lambda}_{0})$ rescale accordingly. Such an approach is not new and follows the idea of \cite{KelPru10}. We use this approach here in order to simplify the analysis and the interpretation of the \emph{stationary SDPD} model in the high dimensional setup.

In fact, when $p\rightarrow\infty$ and $k_i=O(p)$, the vectors ${\mathbf w}_i$ may change with $p$ and this may have an influence on the scale order of the process.
This happens, for example, if we consider a row-normalized spatial matrix ${\mathbf W}$, since the weights become infinitely small for infinitely large $p$. Looking at the (\ref{diagonalize}), model (\ref{b1}) appears to become spatially uncorrelated for $p\rightarrow\infty$ because matrix ${\mathbf W}$ tends to be asymptotically diagonal (for $p\rightarrow\infty$ and $T$ given). As a consequence, the model appears to become not identifiable in the high dimension with respect to the parameters $\lambda_{0i}$. To avoid this, we assume here that also the coefficients $\lambda_{0i}$ may depend on the dimension $p$, borrowing the idea of \cite{KelPru10}. In such a way, we can derive the conditions for the identifiability of the model in the high dimension and better convergence rates for the estimators. This is shown by the following theorem.

\begin{theorem}\label{theorem4}
Consider $\hat\lambda_{0i}$ and $\hat\lambda_{1i}$, the estimators obtained by the algorithm in section \ref{est_alg}. Assume that the number of nonzero values in ${\mathbf w}_i$ is $k_i=O(p)$ for all $i=1,\ldots,p$. Under assumptions $A1-A4$, for $p\rightarrow\infty$ we have the following cases:
\begin{itemize}
\item[(i)] if the vectors ${\mathbf w}_i$ are normalized by $L_1$ norm then
\[
\left|\hat\lambda_{ji}-\lambda_{ji}\right| =O_p(T^{-1/2}) \nonumber\qquad\qquad  {\rm for\ } j=0,1;i=1,\ldots,p,
\]
provided that $\lambda_{0i}=O(p)$;
\item[(ii)] if the vectors ${\mathbf w}_i$ are normalized by $L_2$ norm and $\lambda_{0i}=O(1)$ then
\[
\left|\hat\lambda_{ji}-\lambda_{ji}\right| =O_p(T^{-1/2}) \nonumber\qquad\qquad  {\rm for\ } j=0,1;i=1,\ldots,p;
\]
\item[(iii)] for generic (not normalized but bounded) vectors ${\mathbf w}_i$ and $\lambda_{0i}=O(1)$ we have
\[
\left|\hat\lambda_{ji}-\lambda_{ji}\right| =O_p(pT^{-1/2}) \nonumber\qquad\qquad  {\rm for\ } j=0,1;i=1,\ldots,p.
\]
\end{itemize}
\end{theorem}

As shown by Theorem \ref{theorem4}, cases (i) and (ii), if we consider a row-normalized spatial matrix ${\mathbf W}$, our estimation procedure is consistent for any value of $p$ and with $p\rightarrow\infty$ at any rate. In other words, the convergence rate is not affected by the dimension $p$. However, there are some differences between the two cases of $L_1$ and $L_2$ normalization. In the first case, we need to impose that the spatial coefficients $\lambda_{0i}$ increases in the order $O(p)$ as $p\rightarrow\infty$ (otherwise the model becomes not identifiable in the high dimension), whereas in the last case of $L_2$ norm they can remain constant for $p\rightarrow\infty$.
In case (iii), which is more general because it is valid for any ${\mathbf W}$, we need to impose $k_i=o(T^{1/2})$ in order to guarantee the consistency of the estimators.

To complete this section, we want to show the class of processes that can be analysed by our \emph{stationary SDPD} model. Under assumption $A2$, any \emph{stationary SDPD} model can be equivalently represented as a VAR process as in (\ref{b1bis}), with respect to an autoregressive matrix coefficient ${\mathbf A}^*$ defined in (\ref{diagonalize}).
Now, by exploiting the simple structure of our model, we can show the conditions under which the opposite is true. The following corollary derives from standard results.
\begin{corollary}\label{corollary1}
Given a stationary multivariate process ${\mathbf y}_t={\mathbf A}^*{\mathbf y}_{t-1}+\mbox{\boldmath$\varepsilon$}_t^*$, with $\mbox{\boldmath$\varepsilon$}_t^*$ satisfying assumption $A3$, a necessary and sufficient condition to represent the process ${\mathbf y}_t$ by a stationary SDPD model is that matrix ${\mathbf A}^*$ is diagonalizable. Therefore, matrix ${\mathbf A}^*$ must have $p$ linearly independent eigenvectors. This is (alternatively) assured by one of the following sufficient conditions:
\begin{itemize}
\item the eigenvalues $\lambda_{11},\ldots,\lambda_{1p}$ of matrix ${\mathbf A}^*$ are all distinct, or
\item the eigenvalues $\lambda_{11},\ldots,\lambda_{1p}$ of matrix ${\mathbf A}^*$ consist of $h$ distinct values $\mu_1,\ldots,\mu_h$ having geometric multiplicities $r_1,\ldots,r_h$, such that $r_1+\ldots+r_h=p$.
\end{itemize}
\end{corollary}
By corollary \ref{corollary1} and assumptions $A1-A4$, the VAR processes that cannot be represented and consistently estimated by our \emph{stationary SDPD}  model are those characterized by a matrix ${\mathbf A}^*$ with linear dependent eigenvectors (i.e., those with algebraic multiplicities) or those with complex eigenvalues. In order to apply our model to those cases also, we should generalize the estimation procedure using the Jordan decomposition of matrix ${\mathbf A}^*$. However, we leave this topic to future study.

\section{Simulation study}\label{simulazioni}

This section contains the results of a simulation study implemented to evaluate the performance of the proposed estimation procedure. In section 5.1, we describe the settings and check the validity of the assumptions for the simulated models. Then, in section 5.2, we evaluate the consistency of the estimation procedure and the convergence rate for the estimators using a known spatial matrix. Finally, section 5.3, we analyse the case when the spatial matrix ${\mathbf W}$ is unknown, and therefore, to be estimated.



\subsection{Settings}

We consider three different spatial matrices. In the first, we randomly generate a matrix of order $p\times p$, and we post-multiply this matrix by its transpose in order to force symmetry. The resulting spatial matrix is denoted with ${\mathbf W}_1$. Note that such a matrix is \emph{full}, and it may have positive and negative elements. In the other two cases, the spatial matrix is \emph{sparse} and has only positive entries: ${\mathbf W}_2$ is generated by setting to one only four values in each row while ${\mathbf W}_3$ is generated by setting to one $2\sqrt{p}$ elements in each row.
For all three cases, we check the rank to guarantee that the spatial matrix has $p$ linearly independent rows. Moreover, we set to zero the main diagonal, and we rescale the elements so that each row has norm equal to one ($L_2$ row-normalization).

For the error process, we generate $p$ independent Gaussian series $e_{ti}$ with mean zero and standard error ${\sigma}_i$, where the values ${\sigma}_i$ are generated randomly from a uniform distribution $U(0.5, 1.5)$ for $i=1,\ldots,p$. Then, we define the cross-correlated error process $\mbox{\boldmath$\varepsilon$}_t=\{\varepsilon_{it},t=1,\ldots,T\}$, where
\[
\left\{
\begin{array}{ll}
\varepsilon_{ti} = e_{ti} -0.7*e_{t2} & \mbox{for }i=3,\ldots,p, \\
\varepsilon_{ti} = e_{ti} & \mbox{otherwise}. \\
\end{array} \right.
\]

We generate all $\lambda_{ji}$ from a uniform distribution $U(-0.7, 0.7)$. The settings above guarantee that assumptions $A1-A4$ hold. We generate different models with dimensions $p = (10, 50, 100, 500)$ and sample sizes $T = (50, 100, 500, 1000)$. Note that we may have $T<<p$. For each configuration of settings, we generate 500 Monte Carlo replications of the model and report the estimation results. All the analyses have been made in R.






\subsection{Empirical performance of the estimators when ${\mathbf W}$ is known}


Figure \ref{figure5} shows the box plots of the estimations for increasing sample sizes $T = (50, 100, 500, 1000)$ and fixed dimension $p = 100$. The four plots at the top refer to the estimation of $\lambda_{0i}$ while the four plots at the bottom refer to that of $\lambda_{1i}$. Each plot focuses on a different {location} $i$, where $i = 97,\ldots,100$. The true values of the coefficients $\lambda_{ji}$ are shown through the horizontal lines. Note that we have $T\leq p$ for the first two box plots in each plot, since $p = 100$ for this model. The box plots are centred on the true value of the parameters, and the variance reduces for increasing values of $T$, showing consistency of the estimators and a good performance for small $T$/large $p$ also.

To evaluate the estimation error, for each realized time series, we compute the average error $AE$ and the average squared error $ASE$ using the equations below.
\begin{equation}\label{mse}
AE(\widehat\boldsymbol{\lambda}_j) = \frac{1}{p}\sum_{i=1}^p{(\hat\lambda_{ji}-\lambda_{ji})}, \qquad ASE(\widehat\boldsymbol{\lambda}_j) = \frac{1}{p}\sum_{i=1}^p{(\hat\lambda_{ji}-\lambda_{ji})^2}, \qquad j=1,2.
\end{equation}
Table \ref{tabella1} reports the mean values of $ASE(\widehat\boldsymbol{\lambda}_j)$ (with the standard deviations in brackets) computed over 500 simulated time series for different values of $T$ and $p$.

As shown in the table, the estimation error decreases when the sample size $T$ increases. It is interesting to note that the estimation error does not increase for increasing values of the dimension $p$. This is more evident from figure \ref{increasing_p_global}, which shows the box plots of the average errors $AE(\boldsymbol{\lambda}_0)$ (at the top) and $AE(\boldsymbol{\lambda}_1)$ (at the bottom) computed for 500 replications of the model, with varying values of $p$, sample sizes $T$, and spatial matrix ${\mathbf W}_1$. We can note from the figure that $\hat\boldsymbol{\lambda}_0$ and $\hat\boldsymbol{\lambda}_1$ are unbiased for all $n$ and $p$. Moreover, the variability of the box plots decreases for $p\rightarrow\infty$ and fixed $T$: this is a consequence of averaging the absolute error over the $p$ locations using equation (\ref{mse}).






\subsection{Estimation results when the spatial matrix is unknown}\label{matrixA}
In this section, we evaluate the performance of the proposed estimation procedure when the spatial matrix ${\mathbf W}$ is unknown and needs to be estimated.
In this case, the estimation error has to be evaluated with respect to matrix ${\mathbf A}^*$ in order to include the effects of both $\hat\boldsymbol{\lambda}_{j}$ and $\hat{\mathbf W}$ on the final estimations. So, using (\ref{diagonalize}), we define the two estimators
\begin{eqnarray}
\hat{\mathbf A}_{SDPD}^*({\mathbf W}) &=& \left[{\mathbf I}_p-D(\hat\boldsymbol{\lambda}_0){\mathbf W}\right]^{-1}D(\hat\boldsymbol{\lambda}_1)\left[{\mathbf I}_p-D(\hat\boldsymbol{\lambda}_0){\mathbf W}\right] and \label{AW}\\
\hat{\mathbf A}_{SDPD}^*(\hat{\mathbf W}) &=& \left[{\mathbf I}_p-D(\hat\boldsymbol{\lambda}_0)\hat{\mathbf W}\right]^{-1}D(\hat\boldsymbol{\lambda}_1)\left[{\mathbf I}_p-D(\hat\boldsymbol{\lambda}_0)\hat{\mathbf W}\right],\label{AWhat}
\end{eqnarray}
where matrix ${\mathbf W}$ is assumed to be known in the first case and unknown in the second. When ${\mathbf W}$ is unknown, we estimate it by the (row-normalized) correlation matrix at lag zero, but other more efficient estimators of ${\mathbf W}$ can be considered alternatively.

For the sake of comparison, remembering the \emph{VAR} representation of our model in (\ref{b1bis}), we also estimate matrix ${\mathbf A}^*$ using the classic Yule--Walker estimator of the VAR model $\hat{\mathbf A}_{VAR}^*=\hat\boldsymbol{\Sigma}_0^{-1}\hat\boldsymbol{\Sigma}_1$.

To give a measure of the estimation error, we define
\begin{equation}\label{mse2}
ASE({\mathbf A}^*_{(1)})= \frac{1}{p}\sum_{i=1}^p{(\hat A^*_{1i}-A^*_{1i})^2},
\end{equation}
where $A^*_{1i}$ for $i=1,\ldots,p$ are the true coefficients in the first row of matrix ${\mathbf A}^*$, and $\hat A^*_{1i}$ are their estimated values.
The box plots in figure \ref{figure6} summarize the results of the estimations from 500 replications of the model with $p=100$ (at the top) and $p=500$ (at the bottom). We report the average squared error computed by (\ref{mse2}) in three different cases: the classic Yule--Walker estimator of the VAR model $\hat{\mathbf A}_{VAR}^*$ on the left, our estimator $\hat{\mathbf A}_{SDPD}^*({\mathbf W})$ proposed in (\ref{AW}) with the known spatial matrix in the middle, and our estimator $\hat{\mathbf A}_{SDPD}^*(\hat{\mathbf W})$ proposed in (\ref{AWhat}) with the estimated spatial matrix on the right.

Figure \ref{figure6} shows interesting results. First, note that the classic estimator $\hat{\mathbf A}_{VAR}^*$ cannot be applied when $T\leq p$, and this is a serious drawback of the classic VAR models. On the other hand, the \emph{stationary SDPD} model is equivalently used to represent the same process but it can always generate an estimation result for all values of $T$ and $p$ regardless of whether ${\mathbf W}$ is known or unknown. Moreover, if we compare the box plots, we can note that both the median and the variability of the estimators $\hat{\mathbf A}_{SDPD}^*({\mathbf W})$ and $\hat{\mathbf A}_{SDPD}^*(\hat{\mathbf W})$ are remarkably lower than those relative to the classic estimator $\hat{\mathbf A}_{VAR}^*$ (when available) for all sample sizes $T$ and dimensions $p$. This deserves a further remark: while it is expected that the estimator $\hat{\mathbf A}_{SDPD}^*({\mathbf W})$ performs better than $\hat{\mathbf A}_{VAR}^*$ (given that it exploits the knowledge of the true spatial matrix ${\mathbf W}$), it is surprising to also see that the estimator $\hat{\mathbf A}_{SDPD}^*(\hat{\mathbf W})$ outperforms the classic estimator $\hat{\mathbf A}_{VAR}^*$, notwithstanding the fact that they function under the same conditions (only the time series ${\mathbf y}_t$ is observed and no spatial matrix is known). Of course, the ASE of the estimator $\hat{\mathbf A}_{SDPD}^*(\hat{\mathbf W})$ slightly increases compared to that of the estimator $\hat{\mathbf A}_{SDPD}^*(\hat{\mathbf W})$, but its variability remains more or less the same.










