%\documentclass{uai2023} % for initial submission
\documentclass[accepted]{uai2023} % after acceptance, for a revised
                                    % version; also before submission to
                                    % see how the non-anonymous paper
                                    % would look like
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2023} % ptmx math instead of Computer
                                         % Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2023} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
%\usepackage[american]{babel}
\usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{apalike}
%    \bibliographystyle{agsm}
%    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

% extra packages
\usepackage{amssymb}
\usepackage{MnSymbol}
\usepackage{algorithm}
\usepackage{algorithmic}
\usepackage{ulem}

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\include{TMacros.tex}
\include{JMacros.tex}

\title{Supplement - Establishing Markov Equivalence in Cyclic Directed Graphs}

% The standard author block has changed for UAI 2023 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
%\author[1]{\href{mailto:<jj@example.edu>?Subject=Your UAI 2023 paper}{Jane~J.~von~O'L\'opez}{}}
\author[1]{Tom~Claassen}
\author[2]{Joris~M.~Mooij}
% Add affiliations after the authors
\affil[1]{%
    Institute for Computing and Information Sciences\\
    Radboud University\\
    Nijmegen, Netherlands
}
\affil[2]{%
    Korteweg-deVries Institute\\
    University of Amsterdam\\
    Amsterdam, Netherlands
}
  
  \begin{document}
\maketitle

\begin{abstract}
This supplement contains additional results and proof details related to the UAI2023 submission `Establishing Markov Equivalence in Cyclic Directed Graphs'.
Numbering and notations follow the main article.
\end{abstract}

% ========================================
\section{Additional experimental results} \label{sec:1-ExpEval}
% ========================================
This section elaborates on the random cyclic graph generating process, and a result that offers some added insight into the inner workings of the two CPAG algorithms.

\subsection{Generating random cyclic graphs}
In contrast to the familiar acyclic graphs, in cyclic graphs there can be \textit{two} edges between each pair of nodes, corresponding to a total of $N(N-1)$ possible directed edges for graphs over $N$ nodes. However, in both the Erdos-Renyi model (all graphs with $n$ edges equally likely) and the Gilbert model (all edges appear with equal probability $p$), as density or size of the graph increases, the resulting graph is overwhelmingly likely to contain just one, big strongly connected component, with only a few other nodes on its periphery. As a key part of the CET is about invariant edges \textit{between} components in rule (iv) (see e.g.\ Figure 3 in the main article), just evaluating on arbitrary random graphs would likely lead to an incomplete or biased perspective. In addition, a number of challenges in finding the correct CPAG are related to sequences of connected two-cycles (see main, Figure 2), which in larger fully random graphs are also exceedingly unlikely to appear.

Therefore we tweak the random graph generating process to allow some control over the number and size of the strongly connected components. We introduce a 3-stage process parameterized by size $N$ and density $d$, as well as parameters $p_{two}$ for the proportion of two-cycles, and $p_{acy}$ and $p_{cyc}$ for the proportion of recursive resp.\ nonrecursive edges that remain:
\begin{enumerate}
\item randomly sample the required number of two-cycles, 
\item add random arcs from lower to higher numbered nodes,
\item add completely random arcs for the remaining edges.
\end{enumerate}
Afterwards a random permutation of the nodes is applied to ensure there is no implicit bias in the ordering.

With this procedure, setting $[p_{two},p_{acy}, p_{cyc}] = [0,1,0]$ would lead to a random acyclic graph, whereas setting $[0.1,0.9,0]$ would lead to a random acyclic graph with some edges turned into two-cycles. Setting $[0,0,1]$ would lead to a fully random cyclic graph in the Erdos-Renyi model.
In practice setting e.g.\ $[p_{two},p_{acy}, p_{cyc}] = [0.1,0.82,0.08]$ leads to a varied number and size of the strongly connected components for graphs of up to $N=200$ nodes with density $d=3.0$. For $N=200$ this leads on average to about 11 nontrivial strongly connected components with average largest component size of about 17 vertices.

For larger/higher density graphs the $p_{cyc}$ proportion should be reduced to avoid collapsing into the `one big cycle' trap. In our experiments for $d=5.0$ we used $[p_{two},p_{acy}, p_{cyc}] = [0.05,0.93,0.02]$, which, for $N=200$ resulted on average in about 5 nontrivial strongly connected components, with an average largest size of about 70 vertices.

Additional implementation details will be published with the accompanying source code.

\subsection{Relative time spent per stage}
To take a closer look at the relative contribution of each stage in the two different CPAG procedures to the overall time complexity we also timed each stage separately. Average results are depicted below.

\begin{figure}[h]
  \centering
  \includegraphics[width=1.0\linewidth]{fig5RelTimeStage.pdf}
\renewcommand\thefigure{5}	
  \caption{\small Plots depicting the relative proportion each algorithm spends on average in the different stages, as a function of the size of the graph $N$, for two different densities $d \in \{3.0, 5.0\}$. Stages are ordered bottom up, i.e.\ first stage on the x-axis, second stage on top of that etc.} 
  \label{fig5PerfResults}
\end{figure}

We see that the original CPAG-from-Graph procedure spends the vast majority of its time in the expensive \textit{d}-separation searches in stage (a) (blue) and (c) (yellow), whereas for sparse graphs the new Graph-to-CPAG version spends roughly equal amounts in each phase. For denser graphs, in the latter starts the final stage (green) that aims to orient invariant edges within and between cycles, starts to dominate, as expected from the complexity analysis in section 4.3.

Note that, even though it may seem that for higher densities this final stage in the new Graph-to-CPAG procedure is somehow less efficient, it is still about 4 times as fast as the corresponding stage (f) (light-blue/teal) in the original CPAG-from-Graph procedure. It is just that in the latter stages (a) and (c) are even more expensive.


% ========================================
\section{Proof details}\label{sec:2-Proofs}
% ========================================
\textbf{Lemma 1} \textit{For a directed graph $\G$ and corresponding CMAG $\M$, there is a u-strucuture $\seq{X,Z,Z',Y}$ in $\M$ iff there is an uncovered itinerary $\pi = \seq{X,Z,U,..,U',Z',Y}$ in $\G$, possibly with $Z=U'$ or $U = U'$, where $\seq{X,Z,U}$ and $\seq{U',Z',Y}$ are a pair of m.e.\ conductors w.r.t.\ the uncovered itinerary $\pi$ in $\G$.}
\begin{proof}
By definition 9, a \textit{u}-structure $\seq{X,Z,Z',Y}$ implies the existence of an uncovered path $\pi = \seq{X,Z,U_1,..,U_k,Z',Y}$ (possibly with $U_1 = U_k$ or $U_1 = Z', U_k = Z$) between nonadjacent $X$ and $Y$ in $\M$, corresponding to an uncovered itinerary in $\G$ where all nodes $\{Z,Z',U_1,..,U_k\}$ are ancestors of each other, but not of $X$ or $Y$, which implies $\seq{X,Z,U_1}$ and $\seq{U_k,Z',Y}$ are a pair of m.e.\ conductors w.r.t.\ the uncovered itinerary $\pi$ in $\G$.

Conversely, if $\seq{X,Z,U}$ and $\seq{U',Z',Y}$ are a pair of m.e.\ conductors w.r.t.\ an uncovered itinerary $\pi = \seq{X,Z,U,..,U',Z',Y}$ in $\G$, then $\pi$ is a also an uncovered path $\seq{X,Z,..,Z',Y}$ in $\M$, where all intermediate nodes are ancestor of each other, and so $\{Z,U,..,U',Z'\} \subset SCC(Z)$, but not of $X$ or $Y$, and so $X \tea Z$ and $Z' \aet Y$in $\M$, which by definition 9 implies $\seq{X,Z,Z',Y}$  is a \textit{u}-structure.
\end{proof}

% Lemma 2
\textbf{Lemma 2} \textit{In a CMAG $\M$, a pair of nodes $\seq{X,Z}$ is part of a u-structure $\seq{X,Z,Z',Y}$ with a node $Y \in \bfY \subseteq pa(SCC(Z)) \setminus adj(\{X,Z\})$, iff $X \in pa(Z)$, and $X$ and $Y$ are connected in the undirected subgraph over $((SCC(Z) \setminus adj(X)) \cup \{X,Z\} \cup \bfY$.}
\begin{proof}
The given implies the existence of some path from $X$ via adjacent nodes in the undirected subgraph to some node from $\bfY$. Let $Y$ be the first 
node from $\bfY$ encountered along this path, then $\seq{X,Z_1,..,Z_k,Y}$ is a path over distinct nodes where all $Z_i \in SCC(Z)$ are ancestors of each other, but not of $X$ or $Y$.

If the path $\seq{X,Z_1,..,Z_k,Y}$ is not uncovered, then some subsequence  $\seq{X,U_1,..,U_m,Y}$ with $\{U_1,..,U_m\} \subset \{Z_1,..,Z_k\}$ can be chosen so that $\seq{X,U_1,..,U_m,Y}$ is an uncovered path in the unoriented subgraph (see e.g.\ Lemma B.1 in \citep{Zhang2008}). 
Furthermore, as all nodes adjacent to $X$ in $\M$ are excluded from this subgraph with the exception of $Z$, it means that $Z = U_1 = Z_1$. 
We also know that $m \geq 2$, as all $Y \in \bfY$ were taken not to be adjacent to $Z$, so $Z' = U_m \neq Z$.

Finally, as all $U_i \in SCC(Z)$ are ancestors of each other, but not of $X$ or $Y$, it also means that each consecutive triple along the path is a noncollider in $\M$, and so in accordance with definition 9 $\seq{X,Z,Z',Y}$ is a \textit{u}-structure.
\end{proof}
% =======

For the proof of Theorem 1 we use two helpful results. 

\textbf{Corollary 1} In a CMAG $\M$, a virtual collider triple $\seq{A,B,C}$ uniquely corresponds to either:
\begin{enumerate}
\item a virtual \textit{v}-structure $\seq{A,B,C}$, or 
\item a \textit{u}-structures $\seq{A,B,B',C}$, or 
\item a \textit{u}-structure $\seq{A,B',B,C}$, 
\end{enumerate} 
where for the latter two the complementary triple $\seq{A,B',C}$ is also a virtual collider triple.
\begin{proof}
If virtual collider triple $\seq{A,B,C}$ corresponds to a virtual \textit{v}-structure, then it cannot be part of a \textit{u}-structure $\seq{A,B,B',C}$ or $\seq{A,B',B,C}$, as that would imply the path from $A$ to $C$ via $B$ is not uncovered, contrary Definition 9. Similarly, if virtual collider triple $\seq{A,B,C}$ corresponds to a \textit{u}-structure $\seq{A,B,B',C}$, then it cannot also correspond to a \textit{u}-structure $\seq{A,B',B,C}$, as the combination would imply the presence of edges $A \tea B$ and $B \aet C$ in $\M$, which again would contradict the fact that the path $\seq{A,B,..,B',C}$ in $\M$ is uncovered.
By definition 10, in both cases the \textit{u}-structure would imply that the complementary $\seq{A,B',C}$ also satisfies the definition of a virtual collider triple.
\end{proof}

The second is a well-known result that connects nodes that make or break a \textit{d}-separation to ancestral relations, where we use square brackets $[\bfZ]$ to indicate \textit{minimal} sets, i.e.\ sets $\bfZ$ for which there is no strict subset of $\bfZ$ that preserves the \textit{d}-separation relation:

\textbf{Lemma 4} In a directed graph $\G$, if adding a node $Z$ to a conditioning set changes a \textit{d}-separation relation between two nodes $X$ and $Y$ relative to a set $\bfZ$, then:
\begin{enumerate}
\item if $\mciig{X}{Y}{\bfZ}{Z}$, then $Z \in an_{\G}(\{X,Y\} \cup \bfZ)$,
\item if $\mcddg{X}{Y}{\bfZ}{Z}$, then $Z \notin an_{\G}(\{X,Y\} \cup \bfZ)$,
\end{enumerate}
with special case: 
\begin{enumerate}
\item[3.] if $\mcig{X}{Y}{\bfZ}$, then $\bfZ \subseteq an_{\G}(\{X,Y\})$. 
\end{enumerate} 
\begin{proof}
This result was originally derived in \citep[Lemma 2]{Claassen2011} for acyclic graphs in the possible presence of unobserved confounders and selection bias, but the proof for the first two rules carries directly over to the cyclic directed case considered here (i.e.\ without confounders and/or selection bias).
The proof for the special case, where there is no subset of $\bfZ$ that can \textit{d}-separate $X$ and $Y$, did use acyclicity, but also follows similar to the proof of rule 1 in the cyclic case:

By contradiction: let $Z \in \bfZ$ be a node that is not in $an_{\G}(\{X,Y\})$. Let $\bfZ' = de(Z) \cap \bfZ$, i.e.\ all descendants of $Z$ in $\bfZ$ (including $Z$ itself). Then none of the nodes in $\bfZ'$ can be ancestor of $X$ or $Y$, otherwise $Z$ would be ancestor of $X$ or $Y$ as well, contrary to the assumed. We now show that $X$ and $Y$ are also \textit{d}-separated relative to $\bfZ^\ast = \bfZ \setminus \bfZ'$. Suppose there is an unblocked path $\pi$ between $X$ and $Y$ relative to $\bfZ^\ast$. Then $\pi$ cannot contain any noncolliders in $\bfZ^\ast$, otherwise it would be blocked. But $\pi$ must contain at least one node $Z' \in \bfZ'$ that is a noncollider along $\pi$, otherwise the path could not be blocked by adding $\bfZ'$. Therefore $Z'$ must have at least one outgoing arc along $\pi$. Follow $\pi$ in this direction until either a collider is encountered or the end of $\pi$ is reached. But if a collider is reached, then there must be a node $Z^\ast \in \bfZ^\ast$ that is a descendant of that collider, otherwise $\pi$ could not be unblocked. But then this node was a descendant of $Z'$, and so also a descendant of $Z$, which implies it was included in $\bfZ'$, and therefore not in $\bfZ^\ast$. And if the end of the path is reached then there is a directed path from $Z'$ to $X$ or $Y$, and so also from $Z$ to $X$ or $Y$, contrary the assumed. Therefore there can be no unblocked path in $\G$ between $X$ and $Y$ relative to $\bfZ^\ast$, which in turn implies they are \textit{d}-separated by $\bfZ^\ast \subsetneq \bfZ$, which implies that $\bfZ$ was not a minimal separating set, contrary the assumed in rule 3. QED.
\end{proof}

We are now ready to prove the new ancestral CET:

% Theorem 1
\textbf{Theorem 1} \textit{Two CMAGs $\M_1$ and $\M_2$ corresponding to cyclic directed graphs $\G_1$ resp.\ $\G_2$ are Markov equivalent iff
\begin{enumerate}  % [label=(\roman*)]
\item[(i)] they have the same skeleton,
\item[(ii)] they have the same \textit{v}-structures,
\item[(iii)] they have the same virtual collider triples,
\item[(iv)] if $\seq{A,B,C}$ and $\seq{A,D,C}$ are virtual collider triples, then $B$ is an ancestor of $D$ in $\M_1$ iff $B$ is an ancestor of $D$ in $\M_2$.
\end{enumerate}}
\begin{proof}
We show that in terms of the CPAG the first 3 rules are equivalent to the first 4 rules in the original CET, and that the last rule is sound and implies the last two rules in the original CET, which means the combined set of rules is sound and sufficient to ensure Markov equivalence.

(i) By lemma  3 (see below), two nodes in a CMAG $\M$ are adjacent iff they are (virtually) adjacent in the underlying cyclic graph $\G$, and so rule (i) is equivalent between the two CETs.

(ii)+(iii) By definitions 4 and 5 and rule (i), an unshielded triple $\seq{A,B,C}$ in a CPAG is either a conductor, an unshielded perfect nonconductor, or an unshielded imperfect nonconductor in $\G$. Therefore (ii).a+(ii).b in the original CET are equivalent to `have the same unshielded perfect and imperfect nonconductors' (as the remaining unshielded triples then all must correspond to conductors). A perfect nonconductor in $\G$ is a \textit{v}-structure in the CMAG $\M$, and by definition 8 the subset of \textit{imperfect} nonconductors is equivalent to virtual \textit{v}-structures. By corollary 1, a virtual collider triple $\seq{A,B,C}$ is either a virtual \textit{v}-structure, or part of a \textit{u}-structure $\seq{A,B,B',C}$ or $\seq{A,B',B,C}$ for which, by definition 10, the complementary $\seq{A,B',C}$ is also a virtual collider triple. By lemma 1, that means that, depending on the skeleton from rule (i), either $\seq{A,B,U}$ and $\seq{U',B',C}$ are a pair of m.e.\ conductors w.r.t. uncovered itinerary $\seq{A,B,U,..,U',B',C}$, or $\seq{A,B',U'}$ and $\seq{U,B,C}$ are a pair of m.e.\ conductors w.r.t. uncovered itinerary $\seq{A,B',U',..,U,B,C}$. The latter all follow from rule (iii) in the original CET, and therefore rules (ii) + (iii) combined are equivalent to rules (ii).a + (ii).b + (iii) in the original CET.

(iv) Again by corollary 1, the virtual collider triples in rule (iv) either correspond to a virtual \textit{v}-structure (equivalent to an unshielded imperfect nonconductor in $\G$), or are part of a \textit{u}-structure (equivalent to a pair of m.e.\ conductors on an uncovered itinerary in $\G$). Therefore, for rule (iv) we can consider three distinct cases: 1) both virtual collider triples $\seq{A,B,C}$ and $\seq{A,D,C}$ correspond to virtual \textit{v}-structures, 2) one virtual collider triple corresponds to a virtual \textit{v}-structure, and the other is part of a \textit{u}-structure, or 3) both $\seq{A,B,C}$ and $\seq{A,D,C}$ are part of a \textit{u}-structure. Below we will tackle each of these cases in turn:

Case (iv).1: if both $\seq{A,B,C}$ and $\seq{A,D,C}$ correspond to virtual \textit{v}-structures, then they satisfy rule (iv) of the original CET, and so imply that $B$ is ancestor of $D$ in $\M_1$ iff and only iff $B$ is ancestor of $D$ in $\M_2$, and v.v.\ by symmetry.

Case (iv).2: without loss of generality, assume $\seq{A,B,C}$ is a virtual \textit{v}-structure, and $\seq{A,D,C}$ is part of a \textit{u}-structure $\seq{A,D,D',C}$. Note this implies there cannot be an edge between $C$ and $D$, otherwise $\seq{A,D,D',C}$ would not be a \textit{u}-structure.

Therefore, if $D \tea B$ in $\M_1$, then $D \tea B \aet C$ would be a (virtual) \textit{v}-structure, and already be invariant by rule (ii)+(iii), and so also imply $D \tea B$ in $\M_2$. If $D \aet B$ in $\M_1$, i.e.\ $B$ is NOT a descendant of $D$ in $\M_1$, then by rule (iv) of the original CET, $B$ is also not a descendant of $D$ in $\M_2$, and so imply $D \aet B$ in $\M_2$. The only remaining possibility is $D \tet B$ in $\M_1$, which by symmetry then must also apply to $\M_2$.  Therefore rule (iv) is also sound for case 2.

Case (iv).3: now both virtual collider triples $\seq{A,B,C}$ and $\seq{A,D,C}$ are part of a \textit{u}-structure, but neither are virtual \textit{v}-structures. Then by definition 10, either $A \tea B$ or \mbox{$C \tea B$} is in $\M_1$, so without loss of generality assume $A \tea B$. Then $C$ cannot have an edge to $B$ in $\M_1$, otherwise $\seq{A,B,C}$ would be a (virtual) \textit{v}-structure. 
Similarly, virtual collider triple $\seq{A,D,C}$ implies either $A \tea D$, or $C \tea D$, but not both (or it would be covered by case (iv).2 already). 

(3a) Assume $C \tea D$. Then if $B \tea D$ in $\M_1$, then $B \tea D \aet C$ would be an unshielded collider triple and invariant by rules (ii)+(iii), implying $B \tea D$ in $\M_2$ as well. Similarly, if $B \aet D$, then $A \tea B \aet D$ would be an unshielded collider triple, and so appear in $\M_2$ as well, leaving the only other option $B \tet D$ in $\M_1$ as invariant \textit{u}-structure $\seq{A,B,D,C}$ and therefore $B \tet D$ in $\M_2$ as well.

(3b) Assume $A \tea D$ (so an arc from $A$ into both $B$ and $D$, but $C$ not adjacent to either). Then if $B \tea D$ in $\M_1$, then $D$ is a descendant of $B$, but $D$ is not an ancestor of $B$, i.e.\  $B$ and $D$ belong to different strongly connected components. However, then we have two nonadjacent nodes $B$ and $C$ in $\M_1$ which means they can be \textit{d}-separated by some minimal set $\bfZ$ in the underlying $\G_1$. By lemma 4, rule 3 this means the set $\bfZ$ cannot contain $D$ (as it is not an ancestor of either $B$ or $C$), but including it in the conditioning set would unblock a path via $D$ (as $D$ is descendant of both $B$ and $C$), i.e.\ $\mcddg{B}{C}{\bfZ}{D}$, which by Lemma 4 rule 2 implies that $D$ \textit{cannot} be an ancestor of $B$ (or $C$). Therefore $B \tea D$ is then invariant and appears in $\M_2$ as well. Same for the case $B \aet D$, but then with the roles of $B$ and $D$ reversed, leading to $B \aet D$ in $\M_2$ as well. That leaves the case $B \tet D$ in $\M_1$ as the only remaining option, which by symmetry means it must appear in $\M_2$ as well.

Therefore, rule (iv) is also sound for case 3, which implies that indeed rule (iv) in Theorem 1 is sound. As it also covers all instances of rules (iv) and (v) in the original CET it means that, taken together, rules (i)-(iv) of the new CET are sound and imply all invariant features from the original CET. Therefore, Theorem 1 suffices to establish \textit{d}-separation equivalence, which in turn, under the assumed global directed Markov property, ensures `if and only if' Markov equivalence between two CMAGs $\M_1$ and $\M_2$. QED.
\end{proof}

%%=====================
%\textbf{Theorem 1} \textit{Two CMAGs $\M_1$ and $\M_2$ corresponding to cyclic directed graphs $\G_1$ resp.\ $\G_2$ are Markov equivalent iff
%\begin{enumerate}  % [label=(\roman*)]
%\item[(i)] they have the same skeleton,
%\item[(ii.a)] they have the same v-structures,
%\item[(ii.b)] they have the same virtual v-structures,
%\item[(iii)] they have the same u-structures,
%\item[(iv)] if $\seq{A,B,C}$ and $\seq{A,D,C}$ are virtual v-structures, then $D$ ($B$) is an ancestor of $B$ ($D$) in $\M_1$ iff $D$ ($B$) is an ancestor of $B$ ($D$) in $\M_2$.
%\item[(v)] if $\seq{A,B,C}$ is a virtual v-structure, and either $\seq{A,D,D',C}$ or $\seq{A,D',D,C}$ is a u-structure, then $D$ ($B$) is an ancestor of $B$ ($D$) in $\M_1$ iff $D$ ($B$) is an ancestor of $B$ ($D$) in $\M_2$.
%\end{enumerate}}
%\begin{proof}
%%We will show that in terms of the CPAG the first 5 rules are equivalent to the first 5 rules in the original CET, and that the last rule is sound and implies the last rule in the original CET, which means the combined set of rules is sound and sufficient to ensure Markov equivalence.
%
%%(i) By Lemma  3 (see below), two nodes in a CMAG $\M$ are adjacent iff they are (virtually) adjacent in the underlying cyclic graph $\G$, and so rule (i) is equivalent between the two CETs.
%
%% (ii) By definitions 4 and 5, an unshielded triple $\seq{A.B,C}$ in a CPAG is either a conductor, unshielded perfect nonconductor, or an unshielded imperfect nonconductor. Therefore (ii).a+(ii).b in the original CET are equivalent to `have the same unshielded perfect and imperfect nonconductors'. By definition 4, a nonconductor in $\G$ is a \textit{v}-structure in the CMAG $\M$, and by definition 8 the subset of \textit{imperfect} nonconductors is equivalent to a virtual \textit{v}-structure. Therefore together rules (ii).a + (ii).b are equivalent between the two CETs.
%
%% (iii) By Lemma 1, if the original rule (iii) applies to $\seq{A,B,C,..,X,Y,Z}$ in $\G$, then this implies a \textit{u}-structure $\seq{A,B,Y,Z}$ in $\M$. The reverse is not automatically implied as it involves the two extra nodes $C$ and $X$. However, in terms of the CPAG-from-Graph procedure (section 2.5 in the main article), in step (c) only the first and last edge (arcs $A \tea B$ and $Y \aet Z$) actually result in orientations in the CPAG, which implies that the original CET rule (iii) does not rely on the exact form of the itinerary, but only on the \textit{existence} of some uncovered itinerary. Therefore in terms of the implication for the CPAG, both rules (iii) are equivalent.
%
%% (iv) In the CPAG-from-Graph algorithm this rule only implies an orientation in the CPAG if there is an edge between $B$ and $D$. From the symmetry between the pair of unshielded imperfect nonconductors in the original rule (iv) it directly follows that the orientation holds for both edge marks on the edge $B - D$, and is therefore equivalent to the new rule (iv) for the case of two virtual v-structures. 
%
%(v) As for the previous, this rule only implies an orientation in the CPAG if there is an edge between $B$ and $D$. Clearly the triggering conditions for both versions of rule (v) are equivalent (overlapping virtual \textit{v}-structure and \textit{u}-structure), and so the new rule implies the old orientation $B \tea D$ in step (f) of the original CPAG-from-Graph algorithm.
%
%That leaves two other cases: $B \aet D$, and $B \tet D$ We now show that these are also implied by other rules for Markov equivalent graphs. Note that both $A$ and $C$ are nondescendants of all other nodes involved in the rule.
%
%Case 1, $B \aet D$: if both $A$ and $C$ have an edge to $D$, then $\seq{A,D,C}$ would be a virtual v-structure, otherwise if it was a real v-structure, then $B$ would be a descendant of a common child of $A$ and $C$, contrary the given that $\seq{A,B,C}$ is a virtual \textit{v}-structure. But if both $\seq{A,B,C}$ and $\seq{A,D,C}$ are virtual \textit{v}-strucutures, then they would also satisfy rule (iv) which implies the edge $B \aet D$ would already be oriented. 
%If NOT both $A$ and $C$ have an edge to $D$, then either $A \tea B \aet D$ or $C \tea B \aet D$ would be a v-structure in $\M$, and so be oriented as $B \aet D$ by rule (ii.a)
%
%Case 2, $B \tet D$: similar to case 1, if both $A$ and $C$ have an edge to $D$, then this would again satisfy rule (iv) which would already imply the orientation $B \tet D$. 
%If not, then firstly the original rule (v) would already imply the invariant edge mark $B \met D$, as then $B$ is indeed a descendant of $D$ in $\G$. But then again similar to case 1, if NOT both $A$ and $C$ have an edge to $D$, then either $A \tea B \tet D$ or $C \tea B \tet D$ would be an unshielded noncollider triple in $\M$. Therefore we know $B$ is not a non-ancestor of $D$ (otherwise it would be a \textit{v}-structure oriented by rule (ii.a), and so the only other option is that $B$ must be an ancestor of $D$, i.e.\ $\B \tem D$. 
%
%Together this implies that for all cases that satisfy rule (v) the ancestral relations for both sides of the edge between $B$ and $D$ in $\M$ are invariant between all graphs in the same Markov equivalence class, and therefore correctly oriented by the new CET rule (v).
%
%As a result: all rules in the ancestral CET are sound, and imply rules (i)-(v) in the original CET, which means the combined set of rules is sound and sufficient to ensure Markov equivalence.
%\end{proof}
% 
%=====================

From section 4.1:

% Lemma 3
\textbf{Lemma 3} \textit{In a CMAG $\M$ corresponding to directed graph $\G$, two variables $X$ and $Y$ are adjacent, iff $X$ and $Y$ are (virtually) adjacent in $\G$.}
\begin{proof}
Follows directly from Lemma 1 in \citep{Richardson1997}.
\end{proof}

% Theorem 2
\textbf{Theorem 2} \textit{For two different cyclic directed graphs $\G_1$ and $\G_2$, let $\cP_1$ and $\cP_2$ be the corresponding CPAGs output by (Graph-to-CPAG) algorithm 2. Then $\G_1$ is Markov equivalent to $\G_2$ iff $\cP_1 = \cP_2$.}
\begin{proof}
Soundness of the algorithm follows from Theorem 1, in combination with the fact that each orientation has a direct match to an invariant feature in the CET rules and is therefore sound. As the algorithm processes each rule exhaustively, this guarantees the output is a valid CPAG. 

Remainder of the proof strategy carries over directly from Theorem 2 in \cite{Richardson1996c_DCCS}: if any of the orientations triggers in one graph but not the other, then there must be a difference in one or more \textit{d}-separation statement(s) meaning they are not Markov equivalent. We already showed in the proof of Theorem 1 that CET rules (i)-(ii) were equivalent to the original CET rules (i)-(iv), which (again by the proof of the original Theorem 2) ensures that, for two Markov equivalent graphs, $\cP_1$ and $\cP_2$ have the same skeleton, \textit{v}-structures, and virtual collider triples.

The final orientation rule (iv), corresponding to original CET rules (iv)+(v), has a slightly stronger implication than the original, but still cannot introduce or destroy a virtual collider triple, and so if it triggers in one graph, then it triggers in the other graph. Therefore, if $\cP_1$ and $\cP_2$ differ after processing CET-(iv), then $\G_1$ and $\G_2$ must differ on some invariant feature, and so are not Markov equivalent.
\end{proof}

%=====================

\section{MARKOV PROPERTIES FOR STRUCTURAL CAUSAL MODELS}\label{sec:scm}

We state here some of the key definitions and results in the theory of Structural Causal Models (SCMs).
These models, also known as Structural Equation Models (SEMs), were introduced a century ago by \citet{Wright1921} and popularized in AI by \citet{Pearl2009}.
We follow here the treatment of \citet{Bongers++_AOS_21}, as it deals with cycles in a mathematically rigorous way.

\begin{dfn}[SCM]\label{def:SCM}
A Structural Causal Model (SCM) is a tuple $M = \langle \bfV, \bfW, \dom{\bfV}, \dom{\bfW}, \bff, P_M \rangle$ of:
\begin{enumerate}
\item finite disjoint index sets $\bfV, \bfW$ for the endogenous and exogenous variables in the model, respectively;
\item a product of standard measurable spaces $\dom{\bfV} = \prod_{v \in \bfV} \dom{v}$, which define the domains of the endogenous variables; 
\item a product of standard measurable spaces $\dom{\bfW} = \prod_{w \in \bfW} \dom{w}$, which define the domains of the exogenous variables;
\item a measurable function $\bff : \dom{\bfV} \times \dom{\bfW} \to \dom{\bfV}$, the \emph{causal mechanism};
\item a product probability measure $P_M = \prod_{w \in \bfW} P_{w}$ on $\dom{\bfW}$, with each $P_w$ a probability measure on $\dom{w}$, specifying the \emph{exogenous distribution}.
\end{enumerate}
\end{dfn}
The causal structure of the SCM is encoded by the dependences of the components of $\bff$ on the variables in the model. 
This is formalized by:
\begin{dfn}[Parent]
Let $M$ be an SCM. We call $i \in \bfV \cup \bfW$ a parent of $k \in \bfV$ if and only if there does not exist a measurable
function $\tilde f_k : \dom{\bfV\setminus \{i\}} \times \dom{\bfW\setminus\{i\}} \to \dom{k}$ such that 
for $P_M$-almost every $\bfw \in \dom{\bfW}$, for all $\bfv \in \dom{\bfV}$, 
  $$v_k = \tilde f_k(\bfv_{\setminus i},\bfw_{\setminus i}) \iff v_k = f_k(\bfv,\bfw).$$
\end{dfn}
Intuitively, this means that the $k$'th component of $\bff$ does not depend on the $i$'th variable.
This definition allows us to define the directed mixed graph (DMG) associated to an SCM:
\begin{dfn}[Graph]
Let $M$ be an SCM. The \emph{graph} of $M$, denoted $\G(M)$, is defined as the directed mixed graph
with nodes $\bfV$, directed edges $v_1 \to v_2$ iff $v_1$ is a parent of $v_2$ according to $M$, and bidirected edges
$v_1 \aea v_2$ iff there exists $w \in \bfW$ such that $w$ is parent of both $v_1$ and $v_2$ according to $M$.
\end{dfn}
If $\G(M)$ is acyclic, we call the SCM $M$ \emph{acyclic}, otherwise we call the SCM \emph{cyclic}. 
If $\G(M)$ contains no bidirected edges, we call the endogenous variables in the SCM $M$ \emph{causally sufficient}
(which is what we assumed in the present work for simplicity).

SCMs provide an implicit description of their solutions.
\begin{dfn}[Solutions]
A random variable $\rv{} = (\rv{\bfV},\rv{\bfW})$ is called a \emph{solution} of the SCM $M$ if
$\rv{\bfV} = (\rv{v})_{v \in \bfV}$ with $\rv{v} \in \dom{v}$ for all $v \in \bfV$,
$\rv{\bfW} = (\rv{w})_{w \in \bfW}$ with $\rv{w} \in \dom{w}$ for all $w \in \bfW$,
the distribution $\Prb(\rv{\bfW})$ is equal to the exogenous distribution $P_M$, and
the \emph{structural equations}:
$$\rv{v} = f_v(\rv{\bfV}, \rv{\bfW})\quad\text{a.s.}$$
hold for all $v \in \bfV$.
\end{dfn}

For acyclic SCMs, solutions exist and have a unique distribution that is determined by the SCM.
This is not generally the case in cyclic SCMs, as these could have no solution at all, or 
could have multiple solutions with different distributions.
\begin{dfn}[Unique solvability]\label{def:unique_solvability_wrt}
An SCM $M$ is said to be \emph{uniquely solvable w.r.t.\ $\bfO \subseteq \bfV$} if there exists 
a measurable mapping $\bfg_{\bfO} : \dom{\pa_{\G(M)}(\bfO)\setminus\bfO} \to \dom{\bfO}$ 
such that for $P_M$-almost every $\bfw \in \dom{\bfW}$, for all $\bfv \in \dom{\bfV}$:
  \begin{equation*}\begin{split}
    &\bfv_{\bfO} = \bfg_{\bfO}(\bfv_{(\pa_{\G(M)}(\bfO)\setminus\bfO)\cap\bfV}, \bfw_{\pa_{\G(M)}(\bfO)\cap\bfW}) \\
    &\quad\iff\quad \bfv_{\bfO} = \bff_{\bfO}(\bfv,\bfw).
  \end{split}\end{equation*}
\end{dfn}
Loosely speaking: the structural equations for $\bfO$ have an essentially unique solution for $\bfv_{\bfO}$ in terms of the other variables appearing in those equations.
If $M$ is uniquely solvable with respect to $\bfV$ (in particular, this holds if $M$ is acyclic), then it induces a unique \emph{observational distribution} $P_M(\rv{\bfV})$, the push-forward of $P_M$ through $\bfg_{\bfV}$.

One of the key aspects of SCMs---which we do not discuss here in detail because we do not make use of it in this work---is their causal semantics, which is defined in terms of interventions.
Instead, we discuss only their probabilistic properties.
In particular, under appropriate assumptions, the graph $\G(M)$ of an SCM $M$ represents conditional independences that its solutions must satisfy.
As shown already by \citet{Spirtes1994,Spirtes1995}, the directed global Markov property does {\it not} hold in general for cyclic SCMs.
\begin{exm}[$d$-separation fails]
Consider the SCM $M = \langle \{1,2,3,4\}, \{5,6,7,8\}, \RN^4, \RN^4, \bff, P_M \rangle$ where
$P_M$ is the standard-normal distribution on $\RN^4$, and the causal mechanism is given by:
  $$\bff(\bfx) = (x_5, x_6, x_1 x_4 + x_7, x_2 x_3 + x_8)$$
%The graph of M is depicted in Figure 1 on the left. 
The graph $\G(M)$ has edges $1 \tea 3$, $2 \tea 4$, $3 \tea 4$, $4 \tea 3$.
This SCM is uniquely solvable with respect to its strongly connected components $\{1\}$, $\{2\}$, and $\{3,4\}$. 
One can check that for every solution $\rv{}$ of M, $\rv{1}$ is not independent of $\rv{2}$ given $\{\rv{3}, \rv{4}\}$. 
However, the nodes $1$ and $2$ are $d$-separated given $\{3, 4\}$ in $\G(M)$. 
Hence the global directed Markov property does not hold for $M$.
\end{exm}
For more concrete examples of cyclic SCMs, we refer the reader to \citep{Bongers++_AOS_21}.
\citet{Spirtes1994} proved a weaker Markov property in terms of a `collapsed graph', assuming causal sufficiency and densities.
\citet{ForreMooij_1710.08775} found the following formulation in terms of `$\sigma$-separation' that is immediately applicable to the graph of the SCM itself.
\begin{dfn}[Blockable and unblockable noncolliders]
  Let $\G$ be a directed mixed graph and $\pi$ a path in $\G$.
  We call a noncollider on $\pi$ \emph{unblockable} if it is not an end-node and it only has outgoing edges on $\pi$ to nodes in the same strongly connected component of $\G$; otherwise, it is called \emph{blockable}.
\end{dfn}
If $\G$ is acyclic then all noncolliders are blockable.
\begin{dfn}[$\sigma$-separation]
For a triple of node sets $\bfX,\bfY,\bfZ$ in a graph $\G$, we say that $\bfX$ is \emph{$\sigma$-connected} to $\bfY$ given $\bfZ$ iff there is an $X \in \bfX$ and $Y \in \bfY$ such that there is a path $\pi$ between $X$ and $Y$ on which every {\it blockable} noncollider is not in $\bfZ$, and every collider on $\pi$ is an ancestor of $\bfZ$; otherwise $\bfX$ and $\bfY$ are said to be \emph{$\sigma$-separated} given $\bfZ$.
\end{dfn}
Note the small difference with the definition of $d$-connection: $\sigma$-connection only considers the {\it blockable} noncolliders. 
The following general result was shown by \citet{ForreMooij_1710.08775}.
\begin{thm}[$\sigma$-Separation Markov property]\label{thm:sigma_separation}
Let $M$ be an SCM that is uniquely solvable w.r.t.\ each strongly connected component of $\G(M)$.
Then, the observational distribution of $M$ exists and is unique.
Furthermore, for a solution $\rv{}$ of $M$ and
for $\bfA,\bfB,\bfC \subseteq \bfV$:
if $\bfA$ is $\sigma$-separated from $\bfB$ given $\bfC$ in $\G(M)$, then $\rv{\bfA}$ is conditionally independent of $\rv{\bfB}$ given $\rv{\bfC}$.
%$$\sigmasep{A}{B}{C}{\G(M)} \implies \rv{A} \indep_{P_M} \rv{B} \given \rv{C}.$$
\end{thm}
\begin{proof}
  See the proof of Theorem A.21 in \citet{Bongers++_AOS_21}. 
\end{proof}
Under certain additional assumptions, one can show the stronger $d$-separation criterion (also known as the global directed Markov property).
\begin{thm}[$d$-Separation Markov property]\label{thm:d_separation}
Let $M$ be an SCM that satisfies one of the following three assumptions:
\begin{enumerate}
  \item $M$ is acyclic;\label{thm:d_separation_acyclic}
  \item \label{thm:d_separation_discrete}
    \begin{itemize}
      \item all endogenous domains $\dom{v}$ for $v \in \bfV$ are discrete, and
      \item $M$ is uniquely solvable w.r.t.\ each ancestral subset $A \subseteq \bfV$ (that is, each subset $A \subseteq \bfV$ such that $\an_{\G(M)}(A) = A$);
  \end{itemize}
\item \label{thm:d_separation_linear}
    \begin{itemize}
      \item $\dom{\bfV} = \RN^{\bfV}$ and $\dom{\bfW} = \RN^{\bfW}$, and
      \item $\bff$ is a linear mapping, and
      \item each $v \in \bfV$ has at least one parent in $\bfW$ according to $M$, and
      \item $P_M$ has a density w.r.t.\ the Lebesgue measure on $\RN^{\bfW}$.
    \end{itemize}
\end{enumerate}
Then, the observational distribution of $M$ exists and is unique.
Furthermore, for a solution $\rv{}$ of $M$ and
for $\bfA,\bfB,\bfC \subseteq \bfV$:
if $\bfA$ is $d$-separated from $\bfB$ given $\bfC$ in $\G(M)$, then $\rv{\bfA}$ is conditionally independent of $\rv{\bfB}$ given $\rv{\bfC}$.
%$$\dsep{A}{B}{C}{\G(M)} \implies \rv{A} \indep_{P_M} \rv{B} \given \rv{C}.$$
\end{thm}
\begin{proof}
See the proof of Theorem A.7 in \citet{Bongers++_AOS_21}.
The acyclic case is well known. 
The discrete case fixes the erroneous theorem by \citet{PearlD1996}, for which a counterexample was found by \citet{Nea00}, by adding the 
assumption of unique solvability with respect to each ancestral subset, and extends it to allow for bidirected edges in the graph. 
The linear case is an extension of existing results for the linear-Gaussian setting without bidirected edges \citet{Spirtes1994, Spirtes1995, Kos96} to a linear (possibly non-Gaussian) setting with bidirected edges in the graph.
\end{proof}
For this paper, we assume that the global directed Markov property holds with respect to a graph that contains no bidirected edges.
From the above theorem, it follows that this will hold if the data comes from the observational distribution of a causally sufficient SCM that 
falls into either the acyclic case (\ref{thm:d_separation_acyclic}), the discrete case (\ref{thm:d_separation_discrete}), or the linear case (\ref{thm:d_separation_linear}).
Note that these assumptions are sufficient, but not necessary.

% References
\bibliography{uai2023-newCET}
\end{document}
