\section{Abstract Information for Non-Gaussian Linear Discovery}\label{sec:method}

\input{algorithms/abslingam}

In this section, we introduce Abs-LiNGAM (\Cref{alg:abslingam}),
a strategy to exploit our results
on $\mat{T}$-abstraction
to speedup observational causal discovery
of linear non-Gaussian models, e.g. LiNGAM~\citep{shimizu2011directlingam}.
The intuition is that 
whenever we have a $\mat{T}$-abstraction
of an unknown model to learn,
we can exclude all the candidate solutions
not satisfying the graphical conditions we presented in the previous sections.
Furthermore,
in Abs-LiNGAM,
we demonstrate
how to infer
prior knowledge for the concrete model
from a small number
of paired concrete-abstract samples,
even when the abstract model
and the abstraction function
are unknown, and an abstract dataset is not directly available.
In the following,
we formalize the data-generation process
and the steps of Abs-LiNGAM\@.

\subsection{Data-Generation Process}\label{subsec:dgp}

As in many real-world applications,
where observations are produced
by sensors or other data-collecting devices,
we assume that samples
from the low-level concrete model
have a significantly larger availability
than high-level abstract samples.
We formalize this
intuition
by defining two datasets
\begin{align}
  \dset_{\scm{L}} &\sim \dist{P}_{\scm{L}}\\
  \dset_J &\sim \dist{P}_{\scm{L}, \scm{H}},
\end{align}
where the former contains
concrete samples only
and the latter
paired observations
from the joint observational distribution
of both models,
such that $|\dset_{J}| \ll |\dset_{\scm{L}}|$.
Therefore,
we define the following
data-generating process,
where we produce
a significantly lower number
of abstract samples.
\begin{align}
  \vec{e}^{(i)} &\sim \operatorname{Exponential}
                &\text{for } i=1,\dots,|\dset_{\scm{L}}|,\\
  \vec{x}^{(i)} &= \scm{L}(\vec{e}^{(i)})
                &\text{for } i=1,\dots,|\dset_{\scm{L}}|,\\
  \vec{y}^{(i)} &= \scm{H}(\gamma(\vec{e}^{(i)}))
                &\text{for } i=1,\dots,|\dset_{J}|.
\end{align}
Since we assume linear and non-Gaussian data,
the models are identifiable
in the limit of infinite data~\citep{shimizu2006linear}.
In \Cref{app:noisy},
we discuss preliminary results
to tackle an additional
scenario where
we consider
abstract observations
to be perturbed
by random noise.

\subsection{Abs-LiNGAM}

\paragraph{T-Reconstruction.}
Since we assume a linear transformation,
we can fit the abstraction function
from the joint dataset~$\dset_J$
by solving a least-squares problem~\citep{trefethen2022numerical}.
Then,
for each abstract variable~$Y_i\in\set{Y}$,
we can identify
its set of relevant variables~$\hat{\Pi}_R(Y)$,
as 
\begin{align}
  \hat{\Pi}_R(Y_i) = \{X_k \mid {[\hat{\vec{t}}_i]}_k \neq 0\}.
\end{align}
In practice,
we mask the coefficients
of the fitted abstraction
transformation~$\hat{\mat{T}}$
with a small threshold
to handle numerical instability,
which, whenever a sufficient number
of joint samples $|\mathcal{D_J}|$ is available,
ensures that each relevant block pertains to a single abstract variable.

\paragraph{Abstract Causal Discovery.}
Then, we focus on learning
the abstract causal structure
from data.
Since
we assume
abstract samples
to be scarce,
even in our simplified setting of linear and non-Gaussian models,
the abstract model
might not be discoverable
by the high-level samples
in the joint dataset~$\dset_J$ alone.
However,
after having identified the abstraction function,
we can use it on the concrete dataset
to abstract each sample as in
\begin{align}
  \dset_{\hat{\scm{H}}}= \{ \hat{\mat{T}}^\tr \vec{x} \mid \vec{x} \in \dset_{\scm{L}} \}.
\end{align}
In fact,
whenever
the target model
is a $\mat{T}$-abstraction,
the observational consistency property
ensures that
abstracting concrete samples
is equivalent to directly sampling
from the abstract distribution,
as in the data-generating process.
Then,
we can use the newly generated
abstract samples
with any causal discovery algorithm
for linear non-Gaussian models.

\paragraph{Concrete Causal Discovery}
Finally,
we can use the constraints induced by the abstract model to speedup discovery
of the concrete causal model.
As an immediate consequence of \Cref{theo:connectivity},
the existence
of an abstract directed path
$Y_i \anc Y_j$
entails the existence
of at least
a concrete directed path
between variables
in the corresponding relevant blocks $\Pi_R(Y_i)$ and $\Pi_R(Y_j)$.
We cannot,
however,
directly
infer which
of the possibly many
ancestral relations
the concrete model contains.
On the other hand,
whenever an abstract path does \emph{not} exist,
we can infer that
any variable
in the source block
does not cause,
neither directly or indirectly,
any variable in the target block.
We can therefore
restrict the search space
of the concrete causal discovery problem
by excluding
all solutions
that do not satisfy
the following set of constraints
\begin{equation}
  \begin{aligned}
  \set{K} = \{
    &X_k \centernot\anc X_h
    \mid
    \,X_k \in \Pi_R(Y_i)\\
    \land
    &\,X_h \in \Pi_R(Y_j)
    \land
    \,Y_i \centernot\anc Y_j
  \}.
  \end{aligned}
\end{equation}
We use the DirectLiNGAM algorithm~\citep{shimizu2011directlingam}
to solve the concrete causal discovery problem,
as it can integrate
prior knowledge
in the form of
forbidden direct paths
and thus restrict the set of candidate solutions.
