%\documentclass{uai2024} % for initial submission
\documentclass[accepted]{uai2024} % after acceptance, for a revised version; 
% also before submission to see how the non-anonymous paper would look like 
                        
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2024} % ptmx math instead of Computer
                                         % Modern (has noticeable issues)
% \documentclass[mathfont=newtx]{uai2024} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}
\input{math_commands}
\usepackage{float}
%% Some suggested packages, as needed:
\usepackage[authoryear]{natbib}
\usepackage{bibunits} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams
\usetikzlibrary{shapes,positioning}
\usetikzlibrary{arrows.meta}

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{Localised Natural Causal Learning Algorithms for Weak Consistency Conditions}

% The standard author block has changed for UAI 2024 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1]{\href{mailto:<kai.teh.21@ucl.ac.uk>?Subject=Relaxation and construction of localised natural causal learning algorithms}{Kai~Z.~Teh}{}}
\author[1]{Kayvan Sadeghi}
\author[1]{Terry Soo}
% Add affiliations after the authors
\affil[1]{%
    Department of Statistical Science\\
    UCL, London, UK
}

  \begin{document}
  \begin{bibunit}[plainnat]
\maketitle

\begin{abstract}
By relaxing conditions for ``natural'' structure learning algorithms, a family of constraint-based algorithms containing all exact structure learning algorithms under the faithfulness assumption, we define localised natural structure learning algorithms (LoNS). We also provide a set of necessary and sufficient assumptions for consistency of LoNS, which can be thought of as a strict relaxation of the restricted faithfulness assumption. We provide a practical LoNS algorithm that runs in exponential time, which is then compared with related existing structure learning algorithms, namely PC/SGS and the relatively recent Sparsest Permutation algorithm. Simulation studies are also provided.
\end{abstract}

\section{Introduction}\label{sec:intro}
Inferring causal relationships has always been of great interest in different fields, with some frameworks like potential outcomes and graphs gaining prominence amongst the causality community. A main goal of graph-based causal inference is causal discovery; given data, we would like to uncover the underlying causal structure in the form of a true causal graph, on which conventional graph-based causal inference techniques hinge. We will mostly be concerned with the setting of observational data only, such as when interventional data in the form of randomised control trials are unavailable, from which the true causal graph is recoverable up to its graphical separations. Current causal discovery approaches can generally be categorised into score-based approaches \citep{score} and constraint-based approaches \citep{const}. Here, we will mostly be concerned with the latter. 

Assumptions are needed for constraint-based approaches, otherwise many causal structures representing the same data may be obtained, resulting in vacuous causal statements. Amongst these, the most common and widely known is the faithfulness assumption, where every conditional independence in the data generating distribution is exactly represented by the true causal graph \citep{Faith}; most constraint-based learning approaches such as PC and SGS provably return the true causal graph up to its graphical separations. However in practice and theory, the condition can be too strong at times \citep{faithbad}. 

Efforts to relax the faithfulness assumption include the Sparsest Permutation (SP) algorithm by \citet{UhlSP}, which provably returns the graphical separations of the true causal graph under strictly weaker assumption than faithfulness, at the expense of factorial run time by permuting the causal variables. Greedy approaches to speed up the SP algorithm  \citep{UhlGSP, lam} have been proposed, however these algorithms only return the true causal graph under strictly \emph{stronger} conditions than SP.

Addressing this, \cite{Sad} proposed the class of ``natural'' structure learning algorithms, which under the faithfulness assumption, encompasses constraint-based approaches such as SGS/PC algorithms. In addition, natural structure learning algorithms are also proven to return the true causal graph up to graphical separations under well defined assumptions that are shown to be strictly weaker than faithfulness. Thus, the objective of this paper is as follows: 1.) To further weaken the consistency conditions by defining a localised version of natural structure learning algorithms. 2.) To provide a practical algorithm of this type that works under these conditions. 

The structure of the paper will be as follows: Section \ref{bg} covers the relevant background, Section \ref{tnm} covers the theory and practical algorithm and Section \ref{sim} compares the algorithm with related existing algorithms, PC and SP. All proofs will be deferred to the supplementary material.


\section{Background}\label{bg}
\subsection{Graphs}
We first introduce the relevant concepts in graphical models, as well as some existing results in literature. In this work, unless noted otherwise, graphs will be implicitly assumed to be a \emph{directed acyclic graph} (DAG) that is a graph over the set of nodes \(V=\{1,...,n\}\), with directed edges such that there does not exist a sequence of directed edges from a node to itself. We denote an\(_G(C)\) as the set of nodes \(i\not \in C\) such that there exists a sequence of directed edges from \(i\) to some \(j\in C\) in \(G\).

We denote \(A\perp_G B\cd C\) as graphical separation in graph \(G\) 
%(in the case of DAGs, this can be understood as d-separation) 
between 
\(A,B\) given \(C\), where \(A,B,C \subseteq V\) are disjoint; in the case of DAGs, this can be understood as d-separation.  
A set of random variables \(X=(X_1,...,X_n)\) with joint distribution \(P\) is associated to the set of nodes \(V\). 
We denote \(A\ci B\cd C\) as conditional independence of \( (X_i)_{i\in A} \) and \( (X_j)_ {j\in B} \) given \( (X_k)_{k\in C} \). We relate the two notions together using \emph{Markov property}:
\begin{defn}[Markov property]
    A distribution \(P\) is \emph{Markovian} to \(G\) if   \(A\perp_G B\cd C \Rightarrow A\ci B\cd C\) for all disjoint \(A,B,C \subseteq V\).
\end{defn}
If we have the reverse implication as well, then we have faithfulness: 
\begin{defn}[Faithfulness]
   A distribution   \(P\) is \emph{faithful} to \(G\) if   \(A\perp_G B\cd C \iff A\ci B\cd C\) for all disjoint \(A,B,C \subseteq V\).
\end{defn}
\citet{orderfaith} has shown that \(P\) being faithful to DAG \(G\) implies that \(P\) satisfies \emph{ordered upward stability} and \emph{ordered downward stability} wrt \(G\), defined in the case of DAGs, as follows:
\begin{defn}\hspace{1pt}
\begin{enumerate}
    \item  \emph{(Ordered upward stability (OUS)).}
    %\\ 
    \(P\) satisfies ordered upward stability wrt \(G\) if for all \(i,j,k\) and \(C\subseteq V\backslash\{i,j,k\}\), such that \(k\in \text{\emph{an}}_G(i,j)\), we have \(i\ci j\cd C\Rightarrow i\ci j\cd C\cup \{k\}\).
    \item  \emph{(Ordered downward stability (ODS)).}
    %\\ 
    \(P\) satisfies ordered downward stability wrt \(G\) if for all \(i,j,k\) and \(C\subseteq V\backslash\{i,j,k\}\), such that \(k\not\in \text{\emph{an}}_G(i,j,C)\), we have \(i\ci j\cd C\cup \{k\}\Rightarrow i\ci j\cd C\).
\end{enumerate}  
\end{defn}
If \(P\) is faithful to \(G\), then from \(P\) we can recover the true causal graph \(G\) up to its \emph{Markov equivalence class (MEC)}, defined as the set of all graphs that imply the same graphical separations. 

Denote \(\text{sk}(G)\) to be the \emph{skeleton} of graph \(G\), formed by removing all arrowheads from edges in \(G\). A \emph{v-configuration} is a  set of nodes \(i,k,j\) such that \(i\) and \(j\) are connected to \(k\), but \(i\) and \(j\) are not connected, and will be represented as \(i\sim k\sim j\). A v-configuration oriented as \(i\xrightarrow{}k\xleftarrow{}j\) is a \emph{collider}, otherwise the v-configuration is a \emph{non-collider}.

\begin{remark}
    Some authors allow nodes \(i\) and \(j\) of collider \(i\xrightarrow{}k\xleftarrow{}j\) to be adjacent, but we do not. If \(i\) is not adjacent to \(j\), then \(i\xrightarrow{}k\xleftarrow{}j\) is sometimes called an unshielded collider, but will simply be referred to as a collider here.
\end{remark}

To relate \(\text{sk}(G)\) with distribution \(P\), we define \(\text{sk}(P)\) 
as follows:
\begin{defn}[\(\text{sk}(P)\)]
    Given a distribution \(P\), the skeleton  \(\text{sk}(P)\) is the undirected graph with node set \(V\), such that for all \(i,j\in V\), 
    the node
    \(i\) is adjacent to \(j\) if and only if there does not exist any \(C\subset V\backslash\{i,j\}\) such that \(i\ci j\cd C\).
\end{defn}
Note that \(\text{sk}(P)\) 
%defined 
is the output of the skeleton building step of SGS/PC algorithm under faithfulness. 
%
\begin{defn}[Adjacency faithfulness]
    A distribution \(P\) is adjacency faithful wrt graph \(G\), if for all \(i,j\in V\), we have: 
    \(i\) adjacent to \(j\) in \(G \Rightarrow i\notci j\cd C\) for all \(C\subseteq V\backslash \{i,j\}\).
\end{defn}
Note that if  \(P\) is 
%\emph{adjacency faithful} 
adjacency faithful 
wrt \(G\), then  \(\text{sk}(P)=\text{sk}(G)\).


\subsection{Natural Structure Learning Algorithms}
Let \(P\) be Markovian to the true causal graph \(G_0\), then  a causal learning algorithm aims to recover the graph \(G_0\), up to the MEC, in which case we say that the algorithm is \emph{consistent}. To relax the faithfulness assumption, \citet{Sad} introduced \emph{natural  structure learning algorithms}.
\begin{defn}[Natural structure learning algorithm]
    An algorithm that takes distribution \(P\) as input and outputs DAG \(G(P)\) is natural if:
    \begin{enumerate}
        \item \(\text{\emph{sk}}(G(P))=\text{\emph{sk}}(P)\).
        \item \(P\) satisfies OUS and ODS wrt \(G(P)\).
    \end{enumerate}
\end{defn}

The following conditions on \(P\) and the true causal DAG \(G_0\) ensure the consistency of natural structure learning algorithms: 
\begin{defn}[V-stability]
    \(P\) is V-stable if
    %
    for all v-configurations \(i\sim k\sim j\) in \(\text{\emph{sk}}(P)\), and \(C\subseteq V\backslash \{i,j,k\}\), the 
    independencies 
    \(i\ci j\cd C\) and \(i\ci j\cd C\cup \{k\}\) cannot both hold.
\end{defn}
\begin{remark}
    This is a definition on \(P\) itself, and is implied by the well-known singleton transitive axiom, under adjacency faithfulness.
\end{remark}
\begin{prop}[Theorems 14 and 25 of \cite{Sad}]
\label{prop1}
The graphs    \(G(P)\) and \(G_0\) are Markov equivalent if the following holds:
    \begin{enumerate}  
        \item \(P\) satisfies adjacency faithfulness wrt \(G_0\).
        \item \(P\) satisfies ordered upward and downward stabilities wrt \(G_0\).
        \item \(P\) is V-stable.
    \end{enumerate}
\end{prop}
\begin{remark}
    In \cite{Sad}, Condition 1 above is given in terms of converse pairwise Markovian instead of adjacency faithfulness, this is due to attempts in characterising the consistency conditions in terms of structural equation models (SEM). However, only the weaker adjacency faithfulness is needed and here we are focused on relaxing conditions.
\end{remark}
By Example 21 in \citet{Sad}, it can be seen that combined, these conditions are strictly weaker than \emph{restricted faithfulness}, which is the weakest known consistency condition for SGS/PC \citep{UhlSP}.

Under the faithfulness assumption, constraint-based structure learning algorithms are natural. However, it is unclear whether these algorithms are still natural structure learning algorithms once the faithfulness assumption is relax, and no concrete algorithm is provided in \cite{Sad}. Thus, without assuming faithfulness, we aim to provide a general concrete natural structure learning algorithm that relaxes the consistency conditions in Proposition \ref{prop1}. 

As usual in constraint-based causal learning, we assume the availability of a \emph{conditional independence oracle}---given a probability distribution \(P\), we can determine with certainty whether conditional independence statements are true. In practice, conditional independence statements need to be estimated from the data using methods such as HSIC testing \citep{HSIC}, and is shown to be in general, a hard problem \citep{hard}.

\section{Theory and Methods}\label{tnm}
Here, we present our relaxation of the theory of natural structure learning algorithms and the practical algorithm.
\subsection{Theory}
\begin{defn}[V-OUS and collider-stability]\label{v-ousdef}
    A distribution \(P\) is V-OUS and collider-stable wrt DAG \(G\) if for all v-configuration \(i\sim  k\sim j\) in \(G\):
    \begin{enumerate}
        \item \emph{(V-Ordered upward stability (V-OUS))}. If \(i\sim  k\sim j\) is a non-collider,
        then
        for all \(C\subseteq V\backslash\{i,j,k\}\), 
        we have 
        \(i\ci j\cd C\Rightarrow i\ci j\cd C\cup \{k\}\).
        \item \emph{(Collider-stability)}. If \(i\xrightarrow[]{}k\xleftarrow{}j \), then  
        there exists \(C'\subseteq V\backslash\{i,j,k\}\)  such that \(i\ci j\cd C'\).
    \end{enumerate}
\end{defn}

%Collider-stable has related notions such as 
Collider-stability is related to 
\emph{orientation faithfulness}, which states that the graph is faithful up to v-configurations in the graph \citep{rams}. However %collider-stable
collider-stability
is much weaker, even than the Markovian assumption.
\begin{prop}[Collider-stable is very weak]\label{weak}
    If \(P\) is Markovian to \(G\), then \(P\) is collider-stable wrt \(G\).
\end{prop}


V-OUS and collider-stability
can be seen as local versions of ordered stabilities for the purposes of learning DAGs. In the case of DAGs, V-OUS can be seen as a relaxation of OUS since the implication \(i\ci j\cd C\Rightarrow i\ci j\cd C\cup \{k\}\) in Definition \ref{v-ousdef} only has to hold for \(i,j,k\) that are non-colliders in \(G\). Likewise, collider-stable is implied by ODS and can be seen as a relaxation.

\begin{defn}[Localised Natural Structure learning (LoNS) algorithm]\label{lonsdef}An algorithm that takes input distribution \(P\), and outputs \(G(P)\) is localised natural if:
    \begin{enumerate}
        \item \(\text{\emph{sk}}(P)=\text{\emph{sk}}(G(P))\).
        \item \(P\) is V-OUS and collider-stable wrt \(G(P)\).
    \end{enumerate}
\end{defn}
Note that the above is the same with natural structure learning algorithms, just that one of the requirements is relaxed, namely Condition 2 in Definition \ref{lonsdef}. 
%As such, 
Thus, just like natural structure learning algorithms, all constraint-based algorithms that work under faithfulness are localised natural.

To characterise all DAGs that could be the output of a LoNS algorithm, we introduce the following orientation rule:
\begin{defn}[V-OUS and collider-stable orientation rule wrt \(P\)]\label{rule}
    A \emph{V-OUS and collider-stable orientation rule wrt \(P\)} is defined as an assignment of v-configurations \(i\sim k\sim j\) in \(\text{\emph{sk}}(P)\) into colliders and non-colliders as follows:
    \begin{enumerate}
        \item If \(i\ci j\cd C\) and \(i\notci j\cd C\cup\{k\}\) for some \(C\subseteq V\backslash \{i,j,k\}\), then assign \(i\sim k\sim j\) to be a collider.
        \item If for all \(C\) such that \(i\ci j\cd C\), we have \(k\in C\), then assign \(i\sim k\sim j\) to be a non-collider.
    \end{enumerate}
\end{defn}
A DAG \(G\) is said to satisfy the V-OUS and collider-stable orientation rule wrt \(P\), if \(G\) satisfies: 
\begin{enumerate}
    \item\label{i1} \(\text{sk}(P)=\text{sk}(G)\).
    \item\label{i2} For all v-configurations \(i\sim k\sim j\) in \(G\), via the orientation rule in Definition \ref{rule},
\begin{itemize}
    \item[]if \(i\sim k\sim j\) is assigned to be a collider or non-collider, then \(i\sim k\sim j\) is a collider or non-collider, respectively in \(G\).
\end{itemize}
\end{enumerate}
We have the following characterisation:
\begin{prop}[Characterisation]\label{mnsrep}
   The DAG \(G\) satisfies the V-OUS and collider-stable orientation rule wrt \(P\), if and only if \(P\) satisfies:
   \begin{enumerate}
        \item \(\text{\emph{sk}}(P)=\text{\emph{sk}}(G)\).
        \item \(P\) is V-OUS and collider-stable wrt \(G\).
    \end{enumerate}
\end{prop}

We can now apply the V-OUS and collider-stable orientation rule wrt \(P\) to assign the v-configurations in \(\text{sk}(P)\). Note that the assignment may be incomplete, in the sense that some v-configurations may not satisfy either of the conditions in Definition \ref{rule} and are therefore unassigned. \emph{Modified V-stability} is then defined as when this ambiguity can be resolved using the constraint that the graph is a DAG, as illustrated in Figure \ref{exmodv}.
\begin{defn}[Modified V-stability]
    A distribution \(P\) is \emph{modified V-stable}, if the v-configurations of DAGs that satisfy the V-OUS and collider-stable orientation rule wrt \(P\) is unique.
\end{defn}

\begin{figure}[h]
\begin{center}
\includegraphics[width=\linewidth]{images/modv.pdf}
\end{center}
\caption{Different assigned \(\text{sk}(P)\). Label \(\mathcal{N,C,U}\) denotes the v-configuration that is assigned to be a non-collider, collider and unassigned, respectively. In a), the unassigned v-configuration \(4\sim 3\sim 2\) is constrained to be a collider due to acylicity of a DAG. In b), from lack of bidirectedness in a DAG, unassigned v-configuration \(1\sim 2 \sim 3\) is constrained to be a non-collider. In c), since there exists DAGs such that v-configuration \(2\sim 3\sim 4\) can be either a collider or a non-collider, this ambiguity cannot be resolved. \\
\(P\) is modified V-stable if, after orienting \(\mathrm{sk}(P)\) via the V-OUS and collider-stable orientation rule wrt \(P\), the unassigned v-configurations in \(\mathrm{sk}(P)\) can be resolved using DAG constraints,  as in a) and b).}
\label{exmodv}
\end{figure}

Under V-stability, all v-configurations in \(\text{sk}(P)\) must satisfy either of the conditions in Definition \ref{rule}, leaving no v-configurations in \(\text{sk}(P)\) unassigned. 
%As such,
Thus, 
V-stability implies modified V-stability.
\begin{remark}\label{modvequi}
    Combined with Proposition \ref{mnsrep}, we have the following equivalent notion of modified V-stability: all DAGs \(G\) to which \(P\) satisfies:
    \begin{enumerate}
        \item \(\text{\emph{sk}}(P)=\text{\emph{sk}}(G)\).
        \item \(P\) is V-OUS and collider-stable wrt \(G\).
    \end{enumerate}
    are Markov equivalent.
\end{remark}


We have the following, for the true causal graph \(G_0\):
\begin{theorem}[Sufficient and necessary consistency conditions for LoNS]\label{prop2}
 LoNS algorithms are consistent if and only if:
 \begin{enumerate}
     \item \(P\) satisfies adjacency faithfulness wrt \(G_0\).
     \item \(P\) is V-OUS and collider-stable wrt \(G_0\).
     \item P is modified V-stable.
 \end{enumerate}
\end{theorem}
The following example shows that even when combined with \(P\) being adjacency faithful and V-OUS and collider-stable wrt \(G_0\), V-stability of \(P\) need not be implied:
\begin{example}
Let \(G_0\) be \(1\xrightarrow{}2\xrightarrow{}3\xleftarrow{}4\), and \(P\) induces all the conditional independence implied from the Markov property wrt \(G_0\) in addition to \(1\ci 3\). 

The v-configuration \(2\sim 3\sim 4\) in \(\mathrm{sk}(P)\) satisfies \(2\ci 4\) and \(2\notci 4\cd 3\), then after \(2\sim 3\sim 4\) is assigned as a collider, this constraints \(1\sim 2\sim 3\) to be a non-collider. Thus \(P\) is modified V-stable. Adjacency faithfulness is obvious.

Since we have \(1\ci 3\) and \(1\ci 3\cd 2\), we have that \(P\) is not V-stable, but V-OUS holds since 
\(1\ci 3\cd \{2, 4\}\). 
\end{example}

Thus the conditions in Theorem \ref{prop2} is strictly weaker than those in Proposition \ref{prop1}, which is already weaker than restricted faithfulness, and we will see in Section \ref{sim} that these conditions are different to the sufficient and necessary conditions of SP.

\subsubsection{Realising and Interpreting the V-OUS Condition}
V-OUS is implied by faithfulness. Here, without assuming faithfulness, we discuss cases in which the V-OUS property can still arise, and provide basic interpretations.

\begin{prop}[Conditional exchangability and composition imply V-OUS]\label{exc}
    Let \(P\) satisfy:
    \begin{enumerate}
        \item (Composition property). For all disjoint \(i,j,k,C\), the following holds: \(i\ci j\cd C \And i\ci k\cd C \Rightarrow i\ci \{j, k\}\cd C\).
        \item (Conditional exchangability). For all non-collider v-configuration \(i\sim k\sim j\) in \(G_0\), the marginal distribution of \(P\) on \(\{i,j,k\}\) conditioned on \(V\backslash\{i,j,k\}\) is exchangable.
    \end{enumerate}
    Then \(P\) satisfies V-OUS wrt \(G_0\).
\end{prop}

The composition property allows the deduction of joint independence from pairwise independence, and is satisfied by some common distributions such as Gaussians. The exchangability assumption is commonly made when nodes are indistinguishable from one another, such as in Bayesian theory. 

The V-OUS assumption can be interpreted as a prevention of Simpson's paradox on non-collider v-configuration \(i\sim k\sim j\), since all conditional independencies of \(i\) and \(j\) are preserved when conditioning on \(k\). 

\subsection{Construction of a LoNS algorithm}
Having described the LoNS algorithms, we provide a pseudocode of such an algorithm:
\begin{algorithm}
  \caption{\textbf{M}odifi\textbf{e}d V-stable \textbf{Lo}calised \textbf{N}atural \textbf{S}tructure Learning (Me-LoNS)}
  \textbf{Input}: Probability distribution \(P\)\\
  \textbf{Output}: DAG \(G(P)\)
  \begin{algorithmic}[1]
    \State Construct \(\text{sk}(P)\).
    \State Apply the V-OUS and collider-stable orientation rule wrt \(P\) to assign v-configurations in \(\text{sk}(P)\).
    \State Solve for a DAG \(G(P)\) having skeleton \(\text{sk}(P)\) and satisfy the assignment of v-configurations. If no solution exists, \textbf{return} error.
    \State \textbf{return} \(G(P)\).
  \end{algorithmic}
\end{algorithm}\\

\begin{remark}
    Generally, Me-LoNS differs from PC \citep{const} only in determining whether v-configurations in \(\text{\emph{sk}}(P)\) is a collider. 
    
    Since we have to check conditional independence statements of all subsets, the algorithm have exponential time complexity which is comparable to the skeleton building step of PC, and is a big improvement compared to the factorial running time of SP.

    Since the running time of greedy versions of SP based on depth-first search \citep{UhlGSP, lam} are generally dependent on the depth parameter, it is not obvious whether the running time of Me-LoNS is an improvement.
\end{remark}


Note that Me-LoNS outputs a DAG, and since in the observational causal learning setting we are interested in the corresponding MEC, we can always convert the DAG into CP-DAG which is a graphical object uniquely representing a MEC, for example via the \verb|dag2cpdag| function in the \verb|causal-learn| Python package \citep{python}.

\begin{prop}[Me-LoNS is a LoNS algorithm]\label{propconst} 
Me-LoNS is a LoNS algorithm if and only if there exists a DAG \(G\) to which \(P\) satisfies the following:
\begin{enumerate}
    \item \(\text{\emph{sk}}(P)=\text{\emph{sk}}(G)\).
    \item \(P\) is V-OUS and collider-stable wrt \(G\).
\end{enumerate}
\end{prop}

The consistency conditions of Me-LoNS is then given in Theorem \ref{prop2}. Note that modified V-stability of input distribution \(P\) ensures that the output of Me-LoNS is unique up to MEC. 
\begin{remark}
    The orientation rule of Me-LoNS is similar to conservative PC (CPC) \citep{rams} in the sense that:
    \begin{enumerate}
        \item Both assign non-colliders similarly.
        \item Both allow for 
        %ambiguous/unassigned
        ambiguous or unassigned 
        v-configurations. 
        \item Both have a criterion when the consistency condition relating distribution \(P\) and true causal graph \(G_0\) is violated; if Me-LoNS errors, there is no DAG that satisfies the conditions in Proposition \ref{propconst} (by applying the characterisation in Proposition \ref{mnsrep}). 
    \end{enumerate}However, Me-LoNS relaxes the restricted-faithfulness condition of CPC by orienting colliders differently.
\end{remark}
\section{Simulation and Theoretical Comparisons}\label{sim}
We will compare the consistency conditions of Me-LoNS to some existing constraint-based causal learning algorithms both theoretically and via simulations using the \verb|causal-learn| package \citep{python} in Python. Me-LoNS is implemented via the following steps:
\begin{enumerate}
    \item Use the same skeleton discovery function as PC.
    \item Make a new orientation function based on the new orientation rule.
    \item Use the \verb|scipy.optimize| package to solve the DAG search problem.
\end{enumerate}
We will be using mixed linear integer programming with a constant objective to solve Step 3 of Me-LoNS. In addition to the layered network (LN) formulation from \citet{shoj}, we introduce additional constraints from Step 2 of Me-LoNS as follows:
\begin{align*}
    z_{ik}=z_{jk}=1 && \forall i\sim k\sim j\in \mathcal{C}\\
    z_{ik}+z_{jk}\leq 1 && \forall i\sim k\sim j\in \mathcal{N}
\end{align*}
where \(z_{ij}=1\) if \(i\xrightarrow[]{}j\), and \(z_{ij}=0\) if \(i\xleftarrow[]{}j\), and \(\mathcal{C,N}\) the set of v-configurations in \(\text{sk}(P)\) that are assigned to be colliders and non-colliders respectively by the V-OUS and collider-stable orientation rule wrt \(P\) in Step 2 of Me-LoNS. 

Within each comparison, we will simulate data from the same structural equation model (SEM), with corresponding causal graph \(G_0\) to obtain a total of 1,000,000 samples. These samples are then subdivided into 100 test units of 10,000 samples each. From these 100 tests, we compare the percentage of tests the algorithms return the consistent output (output is Markov equivalent to \(G_0\)). Whenever conditional independence testing is needed, the \verb|fisherz| conditional independence test from the package is used throughout with a significance of 0.05. To test for Markov equivalence of the true causal graph \(G_0\) and the output graph of the algorithm, the \verb|mec_check| function is used. 
\begin{remark}
    Although Me-LoNS is deterministic, due to conditional independence testing being used in simulations, the simulation output is non-deterministic. The simulations aim to investigate how well the theoretical results (stated purely in conditional independencies) hold up under the randomness of conditional independence testing.
\end{remark}

\subsection{Comparison to the PC algorithm}

Me-LoNS strictly generalises PC, as in the following:
\begin{prop}[Me-LoNS strictly generalises PC]\label{pcsucks}
    If \(P\) is V-stable, then the outputs of both PC and Me-LoNS are Markov equivalent. Furthermore, there exist distribution \(P\) and true causal graph \(G_0\), such that Me-LoNS is consistent but not PC.
\end{prop}
\begin{remark}
    In general, PC outputs a representative of a MEC (CP-DAG). Proposition \ref{pcsucks} states that under V-stability, Me-LoNS returns a DAG that is of the MEC represented by the CP-DAG output of PC regardless of violations of consistency conditions.
\end{remark}
\begin{align}
  \epsilon_i &\stackrel{\text{i.i.d.}}{\sim} N(0,1),  i=1,2,3,4\nonumber\\
  X_1 &= \epsilon_1\nonumber\\
  X_2 &=  \epsilon_2\nonumber\\
  X_3 &=  -6X_1+2X_2+\epsilon_3\nonumber\\
  X_4 &=   3X_1+4X_2+\epsilon_4\label{sem2}
\end{align}

\begin{figure}[h]
\begin{center}
\includegraphics[width=0.9\linewidth]{images/drawpc.pdf}
\end{center}
\caption{SEM \ref{sem2} corresponds to the DAG \(G_0\).}
\label{ex2}
\end{figure}

To illustrate Proposition \ref{pcsucks}, we compare Me-LoNS with the PC algorithm from the package using the \verb|definiteMaxP| orientation rule which orients only definite colliders and definite non-colliders (thus in this setting, PC coincides with CPC). The input distribution \(P\) will be induced by SEM \ref{sem2}, having all the conditional independencies implied by the Markovian property wrt \(G_0\) in Figure \ref{ex2}, in addition to 
\(X_1\ci X_2 \cd \{X_3,X_4\}\). 

PC fails to identify the colliders in Figure \ref{ex2}, due to violation of orientation faithfulness. This is reflected in Table \ref{pcres}.
\begin{table}
    \centering
    \caption{Percentage of simulations from SEM \ref{sem2} that the algorithm returns a consistent output.}\label{pcres}
    \begin{tabular}{cc}
      \toprule % from booktabs package
      \bfseries PC & \bfseries Me-LoNS\\
      \midrule % from booktabs package
      8\% & 90\%\\
      \bottomrule % from booktabs package
    \end{tabular}
\end{table}

\subsection{Comparison to Sparsest Permutation (SP) algorithm}
The \emph{Sparsest Markov Representation (SMR)} assumption is the sufficient and necessary consistency condition for SP \citep{UhlSP}, and it is strictly different to the consistency conditions of Me-LoNS in Theorem \ref{prop2}, 
as the following example shows:
\begin{example}[Me-LoNS and SP are different/incomparable]\label{spsucks}
(SMR, but not conditions in Theorem \ref{prop2}). Consider the example from \citet{UhlSP}. Let \(G_0\) be the following:
        \begin{center}
          \begin{tikzpicture}[>=stealth]
\node (e1) at (0,1) {$X_1$};
\node (e2) at (2,0) {$X_2$};
\node (e3) at (0,-1) {$X_3$};
\node (e4) at (-2,0) {$X_4$};
\draw [-{Stealth[scale=1.5]}] (e1) to (e2);
\draw [-{Stealth[scale=1.5]}] (e2) to (e3);
\draw [-{Stealth[scale=1.5]}] (e3) to (e4);
\draw [-{Stealth[scale=1.5]}] (e1) to (e4);

\end{tikzpicture}
\end{center}
 and \(P\) implies the conditional independencies \(X_1 \ci X_3 \cd X_2\) and 
 \(X_2\ci X_4 \cd \{X_1, X_3\}\) and 
 \(  X_1 \ci X_2 \cd X_4\). It can be seen that SMR holds, but adjacency faithfulness is violated.\\
        
(Conditions in Theorem \ref{prop2}, but not SMR). Consider the graph \(G_0\) in Figure \ref{ex1}, with \(P\) implying the conditional independencies \(X_2\ci X_3\) and \(X_1\ci X_3\) and 
\(X_2\ci X_3\cd \{X_1,X_4\}\). 
Here, adjacency faithfulness holds. V-OUS holds since there are no non-colliders to check in \(G_0\), and modified V-stability of \(P\) also holds since V-stability of \(P\) holds. 
        
        \(P\) is Markovian to both \(G_0\) and \(G'\) where \(G'\) differs from \(G_0\) by flipping the edge \(X_2\xrightarrow{} X_4\). \(G_0\) and \(G'\) are both sparsest Markovian graphs to \(P\), but are not Markov equivalent, thus SMR does not hold.

        Note that this counter-example hinges on the fact that singleton transitivity of \(P\) does not hold otherwise we would have \(X_2\ci X_4\cd X_1\) or \(X_3\ci X_4\ci X_1\), violating adjacency faithfulness, thus \(P\) cannot be Gaussian.\\
        
\end{example}
\begin{align}
  \epsilon_i, \phi_j &\stackrel{\text{i.i.d.}}{\sim} \mathrm{Bern}(\tfrac{1}{2}), i=1,\ldots,4, j=1,\ldots,5\nonumber\\
  X_1 &= (\phi_1, \phi_2, \epsilon_1)\nonumber\\
  X_2 &=  (X^1_1,\phi_3, \epsilon_2)\nonumber\\
  X_3 &=  (\phi_4,\phi_5, \epsilon_3)\nonumber\\
  X_4 &=   (X_1^1+X_3^1, X_2^1+X_3^2, X_2^2, \epsilon_4)\label{sem1}
\end{align}
Here the \(+\) in the structural assignment of \(X_4\) in SEM \ref{sem1} denotes regular addition, and \(X^j_i\) denotes the \(j\)-th entry from the left of \(X_i\).

\begin{figure}[h]
\begin{center}
\includegraphics[width=0.9\linewidth]{images/drawsp.pdf}
\end{center}
\caption{SEM \ref{sem1} corresponds to the DAG \(G_0\).}
\label{ex1}
\end{figure}
\begin{remark}
    In the supplementary material, \(X_1\ci X_3\cd X_2\) and \(X_2\ci X_3\cd X_1\) are not needed for the example, and is merely a byproduct from the construction of SEM \ref{sem1}.
\end{remark}
\begin{figure*}[t]
\begin{center}
\includegraphics[width=0.9\linewidth]{images/Chart.drawio.pdf}
\end{center}
\caption{Diagram relating the results of this paper, with the dotted arrows and shaded texts indicating existing results and the rest of diagram being novel contributions.}
\label{chart1}
\end{figure*} 
Since greedy versions of the SP algorithm have stronger consistency conditions, Example \ref{spsucks} shows that Me-LoNS is a viable alternative to all greedy variants of SP since Me-LoNS works under different conditions.

To illustrate the incomparability of Me-LoNS and SP from Example \ref{spsucks}, we compare Me-LoNS to the implementation of SP in the  package, \emph{greedy relaxation of sparsest permutation (GRaSP)} \citet{lam}. The input distribution \(P\) will be induced by SEM \ref{sem1}, having all the conditional independencies in Example \ref{spsucks} in addition to \(X_2\ci X_3\cd X_1\) and \(X_1\ci X_3\cd X_2\).

\begin{remark}
    GRaSP cannot differentiate the direction of edge \(X_2\xrightarrow{} X_4\) in Figure \ref{ex1}, thus it returns a consistent output about half the time, as shown in Table \ref{spres}. In the case of \(G_0\) being comprised of \(n\) disconnected components, with each component being the \(G_0\) in Figure \ref{ex1}, GRaSP will then return a consistent output about \(\tfrac{1}{2^n}\) of the time.
\end{remark}

\begin{table}
    \centering
    \caption{Percentage of simulations from SEM \ref{sem1} that the algorithm returns a consistent output.}\label{spres}
    \begin{tabular}{cc}
      \toprule % from booktabs package
      \bfseries GRaSP & \bfseries Me-LoNS\\
      \midrule % from booktabs package
      56\% & 94\%\\
      \bottomrule % from booktabs package
    \end{tabular}
\end{table}

\section{Conclusion and Future Work}
The contributions of this paper can be summarised in Figure \ref{chart1}:

The proposed Me-LoNS algorithm has the following desirable properties:
\begin{enumerate}
    \item It is a strict generalisation the PC algorithm, and is consistent under strictly different conditions than SP.
    \item It has exponential run time which is comparable to the skeleton building step of SGS.
\end{enumerate}
Hence, Me-LoNS provides another option for an algorithm that is consistent strictly beyond faithfulness, but runs in exponential time which is better than the factorial running time of SP algorithm \citep{UhlSP}. Although there exist speed-ups of the SP algorithm, such as ones based on greedy search like GRaSP used in the Section \ref{sim}, these algorithms are faster at the cost of stronger consistency conditions \citep{UhlGSP, lam}.

Note that the work done is focused on DAGs, it may be possible to extend the work done to ancestral graphs, which represents causal systems with latent variables, since the notion of ordered upward and downward stabilities are well defined for anterial graphs in general \citep{orderfaith}.

% References
\putbib[uai2024-template.bib]
\end{bibunit}
\newpage

\onecolumn

\title{Localised Natural Causal Learning Algorithms for Weak Consistency Conditions\\(Supplementary Material)}
\maketitle

\appendix
    \renewcommand{\bibsection}{\subsubsection*{References (Supplementary Material)}}

\begin{bibunit}[plainnat]
\section{Proofs}
We will use the following well known results:
\begin{prop}[\citep{lauritzen1996}]\label{simp}
    If   \(P\) is Markovian to \(G\), then \(P\) is pairwise Markovian to \(G\); 
    that is, for every non-adjacent \(i,j\), we have \( i\ci j\cd \text{\emph{an}}_G(i,j)\).
\end{prop}
\begin{prop}[\citep{pearl}]\label{Markovchar}
   The DAGs \(G_1\) and \(G_2\) are Markov equivalent if and only if:
    \begin{enumerate}
        \item \(\text{\emph{sk}}(G_1)=\text{\emph{sk}}(G_2)\).
        \item The set of colliders in \(G_1\) coincides with the set of colliders in \(G_2\).
    \end{enumerate}
\end{prop} 
\begin{prop}[\citep{rams}]\label{adjf}
    \(\text{\emph{sk}}(P)=\text{\emph{sk}}(G)\) if and only if \(P\) is adjacency faithful wrt \(G\).
\end{prop}
\begin{proof}[Proof of Proposition \ref{weak}]
    Let \(i,k,j\) be a collider in the DAG \(G\). Since \(P\) is Markovian to \(G\), we have that 
    \(i\ci j\cd \mathrm{an}_G(i,j)\) by the pairwise Markov property in Proposition \ref{simp}, and by acyclicity of \(G\), 
    we have 
    \(k\not \in \mathrm{an}_G(i,j)\).
\end{proof}

\begin{proof}[Proof of Proposition \ref{mnsrep}]\hspace{2pt}\\
    \textbf{If}: Let \(P\) be V-OUS and collider-stable wrt \(G\). Since \(\text{sk}(P)=\text{sk}(G\)), it suffices to show Item \ref{i2} that, for v-configurations \(i\sim k\sim j\) in \(\text{sk}(P)=\text{sk}(G)\), we have:
\begin{enumerate}
    \item If \(i\sim k\sim j\) is assigned to be a collider, then if \(i\sim k\sim j\) is a non-collider in \(G\), V-OUS is violated.
    \item Likewise if \(i\sim k\sim j\) is assigned to be a non-collider \(k\in C\), then if \(i\sim k\sim j\) is a collider in \(G\), collider-stability is violated.
\end{enumerate}
Note that this is due to the orientation rules in Definition \ref{rule} being negations of the V-OUS and collider-stability property. Thus \(G\) satisfies the orientation rule wrt \(P\).
        
\textbf{Only if}: Let G satisfy the V-OUS and collider-stable orientation rules wrt \(P\). Since  then for v-configurations \(i\sim k\sim j\) in \(G\): From Item \ref{i2}, we have the following breakdown:
    \begin{enumerate}
    \item
Let \(i\sim k\sim j\) is a collider in \(G\).  It is easy to verify that in both cases, where it is assigned as a collider or remains unassigned, that \(i\sim k\sim j\) is collider-stable wrt $G$.

%        \item If \(i\sim k\sim j\) is a collider in \(G\), then either:
 %       \begin{enumerate}
  %          \item If \(i\sim k\sim j\) is assigned as a collider, %collider-stability wrt \(G\) then holds for \(i\sim k\sim j\).
   %         then \(i\sim k\sim j\) is collider-stable wrt \(G\).
    %        \item If \(i\sim k\sim j\) is unassigned, then for all \(C\), such that \(k\not \in C\) and \(i\ci j\cd C\), we have that \(i\ci j\cd C\cup k\), collider-stability wrt \(G\) then holds for \(i\sim k\sim j\).
       % \end{enumerate}
        \item 
        Let \(i\sim k\sim j\) be a non-collider in \(G\).
        %If \(i\sim k\sim j\) is a non-collider in \(G\) then either:
        \begin{enumerate}
            \item 
            If \(i\sim k\sim j\) is assigned as a non-collider, then   \(i\sim k\sim j\) is  V-OUS wrt \(G\).
            \item Let  \(i\sim k\sim j\) be unassigned, and $C \subseteq V\setminus\{i,j,k\}$.    If \(i\ci j\cd C\), then we must also have \(i\ci j\cd C\cup \{k\}\), otherwise \(i\sim k\sim j\) would have been assigned to be a collider. Hence \(i\sim k\sim j\) is  V-OUS wrt \(G\).
        \end{enumerate}
    \end{enumerate}
    Thus \(P\) is V-OUS and collider-stable wrt \(G\), and \(\text{sk}(P)=\text{sk}(G)\) follows from Item \ref{i1}.
\end{proof}
To show Theorem \ref{prop2}, we first show Remark \ref{modvequi}:

\begin{proof}[Proof of Remark \ref{modvequi}]
    By Proposition \ref{mnsrep} we see that \(P\) being modified V-stable is equivalent to: DAGs \(G\) to which \(P\) satisfies \(\text{sk}(P)=\text{sk}(G)\), and V-OUS and collider-stable wrt have unique v-configurations. This is equivalent to all such DAGs are Markov equivalent by Proposition \ref{Markovchar}.
\end{proof}
\begin{prop}\label{propapp}
           Let \(G_1\) and \(G_2\) be Markov equivalent DAGs, with the same distribution $P$.   If \(P\) is V-OUS and collider-stable wrt \(G_1\), then \(P\) is V-OUS and collider-stable wrt \(G_2\).
      \end{prop}
    \begin{proof}
        By Proposition \ref{Markovchar},  \(\text{sk}(G_1)=\text{sk}(G_2)\) and the v-configurations and colliders in \(G_1\) and \(G_2\) coincide. Since \(P\) is the same for both \(G_1\) and \(G_2\), \(P\) is V-OUS and collider-stable wrt \(G_2\) by virtue that it is for $G_1$.
        %follows from that of \(G_1\).
    \end{proof}
\begin{proof}[Proof of Theorem \ref{prop2}]
    Denote the output of the algorithm by \(G(P)\).\\
    \textbf{If}: 
        Since \(P\) is adjacency faithful wrt \(G_0\), by Proposition \ref{adjf} and definition of LoNS, \(\text{sk}(P)=\text{sk}(G)=\text{sk}(G(P))\). \(P\) is also V-OUS and collider-stable wrt both \(G_0\) and \(G(P)\), thus \(G_0\) is Markov equivalent to \(G(P)\) by Remark \ref{modvequi}.\\
    \textbf{Only if}: Let \(G(P)\) and \(G_0\) be Markov equivalent.
    \begin{enumerate}
        \item (Adjacency faithfulness). By Proposition \ref{Markovchar}, and since \(G(P)\) is Markov equivalent to \(G_0\), we have that \(\text{sk}(P)=\text{sk}(G(P))=\text{sk}(G_0)\), and by Proposition \ref{adjf}, adjacency faithfulness follows.
        \item (V-OUS and collider-stable). Since \(P\) is V-OUS and collider-stable wrt \(G(P)\), by Proposition \ref{propapp},  \(P\) is V-OUS and collider-stable wrt \(G_0\).  
        \item (Modified V-stability). 
        %By contradiction, if \(P\) is not modified V-stable, then by Remark \ref{modvequi} and Proposition \ref{propapp}, Markov equivalence class of \(G_0\) is not unique, thus there exists \(G(P)\) which are not Markov equivalent to \(G_0\), giving a contradiction.\qedhere
        Appealing to Remark \ref{modvequi}, we note that any DAG  satisfying the conditions in the remark is the output of a LoNS algorithm; thus by assumption, these DAGS are Markov equivalent to \(G_0\).  \qedhere
    \end{enumerate}
\end{proof}
To show Proposition \ref{exc}, we use the following from \citet{exc}:
\begin{prop}[\citep{exc}]\label{kayx}
        If \(P\) is exchangable, then \(P\) satisfying composition is equivalent to \(P\) satisfying upward stability; that is,  for all \(i,j,k\in V\), we have  \(i\ci j\cd C \Rightarrow i\ci j\cd C\cup \{k\}\).
    \end{prop}
    
\begin{proof}[Proof of Proposition \ref{exc}]
    For non-collider \(i\sim k \sim j\) in \(G\), exchangability of the marginal \(\{i,j,k\}\) given \(C\subseteq V\backslash\{i,j,k\}\) follows from conditional exchangability.
    Combined with Proposition \ref{kayx} and composition, the marginal of \(\{i,j,k\}\) conditional on any \(C\subseteq V\backslash{\{i,j,k\}}\) is upward-stable, thus implying V-OUS.  
\end{proof}

\begin{proof}[Proof of Proposition \ref{propconst}]\hspace{2pt}\\
    \textbf{If:} Since there exists a DAG \(G\) to which \(P\) satisfies V-OUS and collider-stability and \(\text{sk}(P)=\text{sk}(G)\), Proposition \ref{mnsrep} guarantees that a DAG that satisfies the V-OUS and collider-stable orientation rule wrt \(P\) exists, and will be returned by Step-3 of Me-LoNS, again by Proposition \ref{mnsrep}, \(P\) will be adajcency faithful and V-OUS and collider-stable wrt this output.
    
    \textbf{Only if:} If there does not exist a DAG \(G\) to which \(P\) satisfies V-OUS and collider-stability and \(\text{sk}(P)=\text{sk}(G)\), by Proposition \ref{mnsrep}, there is no DAG that satisfies the V-OUS and collider-stable orientation rule wrt \(P\), thus Step 3 of Me-LoNS errors.
\end{proof}

\begin{proof}[Proof of Proposition \ref{pcsucks}]
\hspace{8pt}
\begin{enumerate}
    \item (Me-LoNS generalises PC under V-stability). Under V-stability of \(P\), the V-OUS and collider-stable orientation rule wrt \(P\) for assigning colliders becomes the following: for \(i\sim k\sim j\) in \(\text{sk}(P)\), we have
    \begin{equation*}
         \exists C\subseteq V\backslash \{i,j,k\} \quad i\ci j\cd C\And i\notci j\cd C\cup \{k\} \iff \exists C\subseteq V\backslash \{i,j,k\} \quad i\ci j\cd C.
    \end{equation*}
    Note that the RHS is the negation of the V-OUS and collider-stable orientation rule wrt \(P\) when assigning a non-collider, thus the orientation rules reduce to the following:
    \begin{enumerate}
        \item If  \(k\in C\) for all \(C\) such that \(i\ci j\cd C\), then assign \(i\sim k\sim j\) to be a non-collider (unchanged).
        \item Otherwise, assign \(i\sim k\sim j\) to be a collider.
    \end{enumerate}
    This is the same as in PC.
    \item (Me-LoNS works but not PC).  Consider SEM \ref{sem2}, which gives the graph \(G_0\) in Figure \ref{ex2} with the set of conditional independencies being \(X_1\ci X_2\) and \(X_1\ci X_2\cd \{X_3,X_4\}\) and \(X_3\ci X_4\cd \{X_1,X_2\}\). Thus we see that the V-OUS-collider-stable orientation rule orients all the v-configurations correctly, but PC orients the  collider \(X_1 \xrightarrow[]{} X_3 \xleftarrow{} X_2\) as a non-collider.\qedhere
\end{enumerate}
\end{proof}

\putbib[uai2024-template.bib]

\end{bibunit}
\end{document}
