%%%%%%%%%%%%%%%%
% DATA
%%%%%%%%%%%%%%%%

\subsection{Experimental Data}
\label{sec:synthetic_data}

\paragraph{Synthetic DAGs} Theoretical guarantees were validated for four causally sufficient DAG structures (Figures \ref{fig:ten_node_dag}, \ref{fig:m_butterfly}, \ref{fig:dag_17}, \ref{fig:complex_backdoor}) and one structure with hidden variables (Figure \ref{fig:latents}). In the discrete data simulations, we used 12 data generating processes for the 10-node DAG (Figure \ref{fig:ten_node_dag}), four processes for both 13-node DAGs (Figure \ref{fig:m_butterfly}), and two processes for the 17-node DAG (Figure \ref{fig:dag_17}). Causal mechanisms were linear and nonlinear. Six linear-continuous data generating processes were simulated for the 10-node DAG (Figure \ref{fig:ten_node_dag}). 

\paragraph{\textsc{Mildew} Benchmark} The \textsc{Mildew} network models fungicide use against powdery mildew in winter wheat \citep{jensen_midas_1996}. We selected one exposure-outcome pair (\textsc{mikro\_1} $\to$ \textsc{meldug\_2}) that meets sufficient conditions for LDP. All variables are categorical. $\z$ contains 31 nodes in $\{\z_1, \z_2, \z_4, \z_5, \z_8\}$, with a low proportion of confounders ($|\z_1| = 2$) and high proportion of colliders ($|\z_2| = 14$). Data were sampled using the \texttt{bnlearn} R package \citep{scutari_learning_2010}. Figure \ref{fig:mildew_full} further describes the DAG used for inference and evaluation.


\subsection{Baseline Methods}

\paragraph{PC Algorithm} PC is a classic global causal discovery algorithm that provides asymptotic theoretical guarantees \citep{spirtes_causation_2000}. It assumes causal Markov, faithfulness, and causal sufficiency and returns a MEC. 
The worst-case time complexity for PC is exponential in the number of nodes, as demonstrated in Figure \ref{fig:test_curve}. Experiments use the implementation by \citet{kalisch_estimating_2007}\footnote{\href{https://github.com/keiichishima/pcalg}{https://github.com/keiichishima/pcalg}}, unless otherwise noted.

\paragraph{MB-by-MB}
MB-by-MB \citep{wang_discovering_2014} infers the local structure around a target node to distinguish parents from children. It sequentially learns Markov blankets (MBs) and the local structures within these, starting from the target node, moving to its neighbors, and so on. It terminates when the parents and children of the target are discovered or if it is not possible to distinguish them, returning the induced \emph{completed partially directed acyclic graph} (CPDAG) over the target and its neighbors. Experiments use an implementation that combines IAMB \citep[Fig.~2]{tsamardinos_algorithms_nodate} and PC
for every sequential step. Like PC, time complexity is worst-case exponential in node count.

\paragraph{Local Discovery using Eager Collider Checks (LDECC)}
LDECC \citep{gupta_local_2023} is a local discovery algorithm that infers the induced CPDAG over a given target node and its neighbors. Unlike MB-by-MB, LDECC does not proceed sequentially and runs conditional independence tests in a similar order as PC, leveraging discovered unshielded colliders to immediately orient the edges
around the target node. %This results in different computational properties than MB-by-MB. %The time complexity of orienting parents depends on the size of the separating sets for the unshielded colliders. 
LDECC is provably polynomial-time for certain categories of DAGs, but exponential for others.

\paragraph{Baseline Evaluation} %and Known Failure Modes 
Let $\mathbf{A}_{XY}$ be any adjustment set for $\{X,Y\}$ returned by a method in this study. Let $\mathbf{A}_{CC} \coloneqq \{\z_1\}$ and $\mathbf{A}_{DC} \coloneqq  \{\z_1, \z_4, \z_5\}$ be valid adjustment sets for $\{X,Y\}$ under the \emph{common cause criterion} (CCC) and \emph{disjunctive cause criterion} (DCC), respectively \citep{vanderweele_new_2011}.  
For PC, $\mathbf{A}_{CC} \coloneqq  \text{ancestors}(X) \cap \text{ancestors}(Y) = \{\z_1\}$, and $\mathbf{A}_{DC} \coloneqq  \{\text{ancestors}(X) \cup \text{ancestors}(Y) \setminus \text{descendants}(X)\} = \{\z_1, \z_4, \z_5\}$, where ancestors and descendants hold for all members of the MEC. As MB-by-MB and LDECC only return the direct parents and children of a single target, we run these baselines with $X$ and $Y$ as separate targets and cache intermediate results to prevent redundant independence testing. $\mathbf{A}_{DC} \coloneqq  \{\text{parents}(X) \cup \text{parents}(Y) \setminus \text{children}(X)\} = \z_1' \cup \z_4 \cup \z_5$, where $\z_1'$ is directly adjacent to $X$, $Y$, or both (but not neither). $\mathbf{A}_{CC} \coloneqq  \{\text{parents}(X) \cap \text{parents}(Y)\}$, i.e., all confounders directly adjacent to both $X$ and $Y$.  Thus, $\mathbf{A}_{CC}$ under LDECC and MB-by-MB are not guaranteed to block all backdoor paths.