\input{figures/mainresults}

\section{Experimental Results}\label{sec:experiments}

In this section,
we discuss
our analysis
on the performance
of Abs-LiNGAM (\Cref{alg:abslingam})
on simulated data.
In particular,
we validate
whether
a small amount
of paired concrete-abstract
observations
can reduce
the search space,
and thus the execution time,
of DirectLiNGAM~\citep{shimizu2011directlingam},
without compromising
the quality of the retrieved
concrete causal structure.
As baseline,
we compare
against applying DirectLiNGAM
to the concrete dataset
without any abstract-induced
prior knowledge.

For each run,
we sample 
the parameters
of an abstraction function
and
of an abstract linear SCM\@.
We then generate
a concrete causal model
by sampling
one of the
possible
$\mat{T}$-concretizations
of the abstract model with \Cref{alg:samplingblocks}.
We provide
details on our experimental setup
and additional results
respectively
in \Cref{app:dataset} and in
\Cref{app:additional}.


We study
the performance
of Abs-LiNGAM
for an increasing number
of paired samples~(\Cref{fig:expA}).
Since Abs-LiNGAM is a multi-step algorithm,
the quality of the retrieved
concrete causal model
strictly depends on the correctness
of the abstraction function, the consequent generated abstract data and abstract causal discovery.
As expected,
whenever
the size of the paired dataset~$|\dset_J|$
is too small,
Abs-LiNGAM
wrongly identifies
concrete paths
as forbidden
and, compared to the baseline,
fails to retrieve the correct
concrete causal model.
However,
whenever the number
of paired samples
approaches the number
of concrete nodes~$|\set{X}|$,
Abs-LiNGAM
performs similarly
to the baseline
and correctly retrieves
the concrete causal model.
We observe
the same trend
for concrete graphs
of increasing size~(\Cref{fig:expB}),
highlighting
how 
prior knowledge induced from the abstract model
significantly reduces
the execution time
compared to the baseline.

Furthermore,
we found that
bootstrapping abstract causal discovery, i.e.,
aggregating several iterations
on randomly extracted sub-datasets,
improves
the performance
on the downstream
concrete discovery task
without noticeably affecting
the execution time,
which is still dominated
by the final concrete causal discovery run.



