%\documentclass{uai2023} % for initial submission
 \documentclass[accepted]{uai2023} % after acceptance, for a revised
                                    % version; also before submission to
                                    % see how the non-anonymous paper
                                    % would look like
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2023} % ptmx math instead of Computer
                                         % Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2023} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
%\usepackage[american]{babel}
\usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{apalike}
%    \bibliographystyle{agsm}
%    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

% extra packages
\usepackage{amssymb}
\usepackage{MnSymbol}
\usepackage{algorithm}
\usepackage{algorithmic}
\usepackage{ulem}

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\include{TMacros.tex}
\include{JMacros.tex}

\title{Establishing Markov Equivalence in Cyclic Directed Graphs}

% The standard author block has changed for UAI 2023 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
%\author[1]{\href{mailto:<jj@example.edu>?Subject=Your UAI 2023 paper}{Jane~J.~von~O'L\'opez}{}}
\author[1]{Tom~Claassen}
\author[2]{\href{mailto:<j.m.mooij@uva.nl>}{Joris~M.~Mooij}{}}
% Add affiliations after the authors
\affil[1]{%
    Institute for Computing and Information Sciences\\
    Radboud University\\
    Nijmegen, Netherlands
}
\affil[2]{%
    Korteweg-deVries Institute\\
    University of Amsterdam\\
    Amsterdam, Netherlands
}
  
  \begin{document}
\maketitle

\begin{abstract}
  We present a new, efficient procedure to establish Markov equivalence between directed graphs that may or may not contain cycles under the \textit{d}-separation criterion.
It is based on the Cyclic Equivalence Theorem (CET) in the seminal works on cyclic models by Thomas Richardson in the mid '90s, but now rephrased from an ancestral perspective. The resulting characterization leads to a procedure for establishing Markov equivalence between graphs that no longer requires explicit tests for \textit{d}-separation, leading to a significantly reduced algorithmic complexity. The conceptually simplified characterization may help to reinvigorate theoretical research towards sound and complete cyclic discovery in the presence of latent confounders.
\end{abstract}

% ========================================
\section{Introduction}\label{sec:1-Intro}
% ========================================
Discovering causal relations from observational and experimental data is one of the key goals in many research areas. 
Developing principled, automated causal discovery methods has been an active area of research within the machine learning community, which has resulted a wide variety of algorithms and techniques.
Two of the main challenges here are handling the impact of unobserved confounders, and the possible presence of feedback mechanisms or cycles in the system under investigation. Both have a long history in the field: in this article we solely focus on the latter.
%Feedback mechanisms play a crucially important role in many application areas. 

%Whether it pertains to global climate patterns or gene regulatory interactions within a single cell, macro-economic policies or biodiversity in complex ecosystems, most real-world systems encompass some form of internal balancing mechanisms that determine their development over time. If we are to understand and analyse the key causal drivers that govern these systems we need to account for such feedback mechanisms, which inevitably means going beyond the well-known causal DAG and its limitations \citep{Robins2003,Dawid2010}. %This work aims to contribute a small step in that direction.



%\mbox{$A \! \relbar \!$ \dashuline{$\!\!\rightarrow \! B \! \leftarrow \!\!$} $ \! \relbar \! C$}

% \newcommand{\tea}{\! \relbar \!\! \rightarrow \!}
%\newcommand{\aet}{\! \leftarrow \!\! \relbar \!}

% \dotuline{A to B}

%Blabla recent work:\cite{Richardson1996b_MEC, RichardsonS1999, Richardson1997, Strobl2018} 

%Other approaches: \cite{ForreM2018, HyttinenEH2012, LacerdaSRH2008, PearlD1996, RothenhauslerHPM2015, Spirtes1994, Spirtes1995}
%Blabla recent work:\cite{Chickering2002, ClaassenB2022, KollerF2009, MooijC2020, Richardson1996a_CCD, Richardson1996b_MEC, RichardsonS1999, Richardson1997, RichardsonS2002, SGS2000, Strobl2018, Tarjan1972, Zhang2008} 
%
%Other approaches: \cite{ForreM2018, HyttinenEH2012, LacerdaSRH2008, PearlD1996, Pearl2009, RothenhauslerHPM2015, Spirtes1994, Spirtes1995}


Building on earlier work by \cite{Spirtes1994, Spirtes1995} on (linear) cyclic directed models that obey the global directed Markov property (see section \ref{sub:graphs}, below), \cite{Richardson1996a_CCD} introduced the Cyclic Causal Discovery (CCD) algorithm that was able to infer a sound cyclic causal model from independence constraints on data. It was based on the so-called Cyclic Equivalence Theorem \citep{Richardson1997} that characterized Markov equivalence between cyclic directed graphs. 

Strangely enough, after this promising start progress in cyclic directed models slowly ground to a halt, even though many challenges remained: the CCD output was certainly not complete, and could not account for latent confounders.

In the mean time theory and methods for acyclic causal discovery took flight, where, for example \cite{Zhang2008} managed to extend FCI to a provably sound and complete algorithm under latent confounders and selection bias.

And even to this day fundamental progress continues to be made: recently several new and faster algorithms and characterizations for establishing Markov equivalence between maximal ancestral graphs (graphical independence models closed under marginalization and conditioning) have been developed \citep{HuE2020,WienobstBL2022,ClaassenB2022}, ultimately bringing it down to linear complexity for sparse graphs. However, despite a widely acknowledged need to handle feedback cycles in learning algorithms for real world causal discovery, major steps towards that goal have been few and far between.

A promising attempt to extend CCD to the case of unobserved confounders was made by \citet{Strobl2018}, but though the resulting CCI algorithm was sound, it was by no means complete, foregoing on key FCI elements like discriminating paths and selection bias, and the output was not guaranteed to uniquely identify the Markov equivalence class.

Fundamentally different approaches to cyclic causal discovery have also been developed: for example, \citet{LacerdaSRH2008} employs independent component analysis, \citet{Mooij_et_al_NIPS_11,MooijHeskes_UAI_13} proposed likelihood-based structure learning approaches for additive noise models, \cite{HyttinenEH2012} exploits experiments to build a complete model, and \cite{RothenhauslerHPM2015} builds on information from unknown shift interventions to reconstruct the underlying cyclic causal graph.

On another front, \cite{ForreM2018} showed that for \textit{nonlinear} causal models with cycles and confounders, the usual $d$-separation criterion needs to be replaced with their $\sigma$-separation criterion (see also section 3 in the supplement). More recently, \cite{MooijC2020} showed that vanilla FCI was in fact already sound and complete for these nonlinear cyclic models. However, it does not account for the peculiarities encountered when handling \textit{linear} cyclic models, as in Figure \ref{fig1TwoCycle}.

For linear or discrete cyclic causal models, $\sigma$-separation is too weak, as the stronger \textit{d}-separation applies. Perhaps surprisingly, this significantly complicates the causal structure analysis. But even in nonlinear systems we often consider linear approximations, which means in practice we may expect to encounter similar complications there as well.
  In section 3 in the supplement we summarize some results from the literature under which cyclic causal models are known to satisfy the stronger \textit{d}-separation criterion. For the current paper it suffices to know that we focus on \textit{d}-separation equivalence between cyclic directed graphs with no unobserved confounders, which, for the important class of systems where the global directed Markov condition in combination with its corresponding faithfulness assumption holds, also implies Markov equivalence.



Part of the reason for the slow progress on cyclic models that satisfy the \textit{d}-separation criterion may be that the associated theoretical machinery developed to characterize Markov equivalence is quite imposing, which may make further extensions towards confounders seem an overly daunting task.

In this article we find things may not be quite as bad as perhaps once feared. We show, for example, that establishing Markov equivalence between directed graphs becomes more intuitive when viewed from an \textit{ancestral} perspective, leading to a simplified characterization and an efficient algorithm that greatly speeds up identification.
Although this is of course but a small step, we hope that it may inspire renewed investigation into full-fledged cyclic causal discovery in the presence of latent confounders and selection bias.

%Angle: graphs that satisfy the global directed Markov property

In the rest of the article, section \ref{sec:2-CyclicGraphs} introduces the necessary tools to handle cyclic directed graphs, section \ref{sec:3-NewCET} describes an alternative, ancestral formulation of the CET, section \ref{sec:4-MarkovEq} shows how to infer a graphical characterization of the Markov equivalence class without the need for \textit{d}-separation tests, and section \ref{sec:5-ExpEval} demonstrates the remarkable efficiency of the resulting procedure compared to current state of the art.
Detailed proofs for all lemmas and theorems, as well as some additional experimental results are provided in the accompanying supplement.


\begin{figure}[h]
  \centering
  \includegraphics[width=0.9\linewidth,page=1]{fig1TwoCycleCPAG.pdf}
  \caption{\small Two different cyclic graphs (left) that together form the only two members of the Markov equivalence class on the right, where the dashed lines signal two \textit{virtual} \textit{v}-structures (see $\S$\ref{sub:introCMAG}). For linear/discrete models conditioning on $C$ would make $A$ and $B$ dependent, but conditioning on $\{C,D\}$ would not.} 
  \label{fig1TwoCycle}
\end{figure}

% ========================================
\section{Cyclic Directed Graphs} \label{sec:2-CyclicGraphs}
% ========================================
In this section we start with a few standard graphical model definitions, and then continue with some perhaps less familiar terminology and results specific to cyclic graphs. 
%Directed graphs and terminology; inducing paths.

\subsection{Graph notations and terminology} \label{sub:graphs}
Throughout this article we use capital letters for vertices/variables, boldface capitals to indicate sets, and calligraphic letters to indicate graphs or distributions.

A \textit{directed graph} (DG) $\G$ is an ordered pair $\seq{\bfV,\bfE}$, where $\bfV$ is a set of vertices (nodes), and $\bfE$ is a set of directed edges (arcs) between vertices. Two nodes in $\G$ are \textit{adjacent} if they are connected by an edge, two edges are \textit{adjacent} if they share a node. A \textit{path} in the graph $\G$ is a sequence of adjacent edges where each consecutive pair along the path is adjacent in $\G$ and each node occurs at most once, or just a single node (a \textit{trivial} path). A \textit{directed path} $X_0 \tea X_1 \tea .. \tea X_k$ is a path where each pair of consecutive nodes is connected by an arc $X_i \tea X_{i+1}$ in $\G$. A \textit{cycle} is a directed path $X_0 \tea .. \tea X_k$ together with an edge $X_k \tea X_0$. A directed graph with no cycles is called a \textit{directed acyclic graph} (DAG).
If $X \tea Y$ in $\G$ then $X$ is called a \textit{parent} of $Y$, and $Y$ a \textit{child} of $X$. Similarly, if there is a directed path from $X$ to $Y$ in $\G$ then $X$ is an \textit{ancestor} of $Y$, and $Y$ a \textit{descendant} of $X$. We use $pa_{\G}(X)$ to denote the set of parents of $X$ in graph $\G$. 
Idem $ch_{\G}(X)$, $an_{\G}(X)$ and $de_{\G}(X)$ for the sets of children, ancestors, and descendants of $X$ in $\G$, with natural extensions to sets, e.g.\ $pa_{\G}(\bfX): \{V: \exists X \in \bfX, V \in pa_{\G}(X) \}$.
A node $Z$ is a \textit{collider} on a path $\seq{..,X,Z,Y,..}$ if the subpath is of the form $X \tea Z \aet Y$, otherwise it is a \textit{noncollider}. A triple of nodes $\seq{..,X,Z,Y,..}$ on a path is said to be \textit{unshielded} if $X$ and $Y$ are not adjacent in $\G$. An unshielded collider $X \tea Z \aet Y$ is known as a \textit{v-structure}.

A \textit{DG model} is an ordered pair $\seq{\G,\cP}$ where $\G$ is a (cyclic or acyclic) directed graph and $\cP$ is a probability distribution over the vertices (variables) in $\G$. The \textit{global directed Markov property} links the structure of the graph $\G$ to probabilistic independences in $\cP$ via the \textit{d}-separation criterion: for disjoint sets of vertices $\bfX,\bfY,\bfZ$ in a graph $\G$, $\bfX$ is \textit{d-connected} to $\bfY$ given $\bfZ$ iff there is an $X \in \bfX$ and $Y \in \bfY$ such that there is a path $\pi$ between $X$ and $Y$ on which every noncollider is not in $\bfZ$, and every collider on $\pi$ is an ancestor of $\bfZ$; otherwise $\bfX$ and $\bfY$ are said to be \textit{d-separated} given $\bfZ$. Two graphs $\G_1$ and $\G_2$ are said to be \textit{d-separation equivalent} iff every \textit{d}-separation in $\G_1$ also holds in $\G_2$ and v.v.
For more details on graphical causal models, see \citep{KollerF2009, SGS2000, Pearl2009, Bongers++_AOS_21}. In section 3 in the supplement, we provide more details on Markov properties in structural causal models, and describe some concrete classes of models for which the \textit{d}-separation criterion applies.

\subsection{Features of Cyclic Graphs}
Next we will introduce a few properties and definitions that are specific to directed graphs with cycles.

\begin{dfn} \label{dfn:scc}
In a directed graph $\G$ over set of vertices $\bfV$, a subset $\bfS \subseteq \bfV$ is a \textbf{strongly connected component (SCC)} of $\G$ iff $\bfS$ is a maximal set of vertices where every vertex is reachable via a directed path in $\G$ from every other vertex in $\bfS$.
\end{dfn}
%Loosely speaking the SCCs correspond to the separate cycle groups in the graph. Every directed graph can be partitioned in a collection of strongly connected components. We use $scc_{\G}(X)$ to denote the strongly connected component in $\G$ to which $X$ belongs. An acyclic graph is a graph where every SCC consists of exactly one vertex.

In cyclic graphs the presence of arcs into directed cycles can create dependencies that behave like additional induced edges:
\begin{dfn} \label{dfn:virtual-edge}
In a graph $\G$, two nodes $A$ and $B$ are said to be \textbf{virtually adjacent} iff there is no edge between $A$ and $B$ in $\G$, but $A$ and $B$ have a common child $C$ which is an ancestor of $A$ or $B$.
\end{dfn}
%\begin{dfn} \label{dfn:p-adjacent}
%In a graph $\G$, two nodes $A$ and $B$ are said to be \textbf{\textit{p}-adjacent} iff there is an edge between $A$ and $B$ in $\G$, or $A$ and $B$ have a common child $C$ which is an ancestor of $A$ or $B$.
%\end{dfn}
Two nodes connected by a virtual edge cannot be \textit{d}-separated by any set of nodes, and therefore appear like they are connected by an edge. In \citep{Richardson1997} virtual edges were also called \textit{p(seudo)-adjacent}. 

These induced virtual edges can also be part of paths we have to consider, giving rise to the generalized concept of an itinerary:
\begin{dfn} \label{dfn:itinerary}
In a graph $\G$, a sequence of vertices $\seq{X_0,...,X_{n+1}}$ where all neighbouring nodes in the sequence are (virtually) adjacent in the graph is said to be an \textbf{itinerary}.
If none of the nodes on the itinerary are (virtually)  adjacent to each other except for the ones that occur consecutively on it then the itinerary is said to be \textbf{uncovered}, otherwise it is said to be \textbf{covered}.
\end{dfn}
%These induced \textit{p}-adjacencies (with \textit{p} for `pseudo') can also be part of paths we have to consider, giving rise to the generalized concept of an itinerary:
%\begin{dfn} \label{dfn:itinerary}
%In a graph $\G$, a sequence of vertices $\seq{X_0,...,X_{n+1}}$ where all neighbouring nodes in the sequence are \textit{p}-adjacent in the graph is said to be an \textbf{itinerary}.
%If none of the nodes on the itinerary are \textit{p}-adjacent to each other except for the ones that occur consecutively on it then the itinerary is said to be \textbf{uncovered}; otherwise it is said to be \textbf{covered}.
%\end{dfn}

Virtual edges can also appear in regular (non)collider triples, leading to the generalized notion of (non)conductors:
%The \textit{p}-adjacencies can also behave like regular (non)collider triples, leading to the generalized notion of (non)conductors:
\begin{dfn} \label{dfn:non+conductor}
In a graph $\G$, a triple $\seq{A,B,C}$ forms a \textbf{conductor} if $\seq{A,B,C}$ is an itinerary, and %$A$ and $B$ are (virtually) adjacent, and $B$ and $C$ are (virtually) adjacent, and 
$B$ is an ancestor of $A$ and/or $C$. If $\seq{A,B,C}$ is an itinerary, but $B$ is NOT an ancestor of $A$ or $C$, then $\seq{A,B,C}$ is a \textbf{nonconductor}.
A (non)conductor $\seq{A,B,C}$ is said to be \textbf{unshielded} if $A$ and $C$ are not (virtually) adjacent, otherwise it is \textbf{shielded}.
\end{dfn}
%\begin{dfn} \label{dfn:non+conductor}
%In a graph $\G$, a triple $\seq{A,B,C}$ forms a \textbf{conductor} if $A$ and $B$ are \textit{p}-adjacent, and $B$ and $C$ are \textit{p}-adjacent, and $B$ is an ancestor of $A$ and/or $C$. If $\seq{A,B,C}$ satisfies the first part, but $B$ is NOT an ancestor of $A$ or $C$, then $\seq{A,B,C}$ is a \textbf{nonconductor}.
%A conductor/nonconductor $\seq{A,B,C}$ is said to be \textbf{unshielded} if $A$ and $C$ are not \textit{p}-adjacent, otherwise it is \textbf{shielded}.
%\end{dfn}

In some case we can actually detect the presence of some induced edge, although we can never be sure which one:
\begin{dfn} \label{dfn:im+perfect noncond}
In a graph $\G$ a nonconductor triple $\seq{A,B,C}$ is a \textbf{perfect nonconductor} if $B$ is also a descendant of a common child of $A$ and $C$. If not, then $\seq{A,B,C}$ is an \textbf{imperfect nonconductor}.
\end{dfn}
Key notion here is that for unshielded perfect nonconductors conditioning on a set that includes $B$ \textit{always} creates a dependence between $A$ and $C$, whereas unshielded imperfect nonconductors do create a dependence when conditioning on $B$, but not for \textit{every} set containing $B$. This is impossible in acyclic graphs and is therefore a hallmark for the presence of cycles. See the two virtual \textit{v}-structures in Figure \ref{fig1TwoCycle} for an example.

Finally, as pièce de résistance, we have some patterns that introduce a nonlocality aspect:
\begin{dfn} \label{dfn:me-cond}
If $\seq{X_0,...,X_{n+1}}$ is a sequence of vertices such that, each consecutive triple along the (uncovered) itinerary is a conductor, all nodes $\{X_1,..,X_n\}$ are ancestors of each other, but not ancestors of either $X_0$ or $X_{n+1}$, then the triples $\seq{X_0,X_1,X_2}$ and $\seq{X_{n-1},X_n,X_{n+1}}$ are \textbf{mutually exclusive (m.e.) conductors w.r.t. an (uncovered) itinerary}.
\end{dfn}
An example is depicted in Figure \ref{fig2Ustruct}. As a result, graphs that have identical \textit{d}-separation relations locally everywhere in the graph can still differ regarding a \textit{d}-separation between nodes that are arbitrarily far apart in the graph (something that is impossible in the acyclic case).


\subsection{The Cyclic Equivalence Theorem} \label{sub:orgCET}
With the features introduced in the previous section \cite{Richardson1997} established the following characterization:

\textbf{Cyclic Equivalence Theorem (CET)}: Two directed graphs $\G_1$ and $\G_2$ over vertices $\bfV$ are \textit{d}-separation equivalent iff
\begin{enumerate}
\item[(i)] they have the same (virtual) adjacencies,
\item[(ii).a] they have the same unshielded conductors,
\item[(ii).b] they have the same unshielded perfect nonconductors,
\item[(iii)] two triples $\seq{A,B,C}$ and $\seq{X,Y,Z}$ are mutually exclusive conductors on some uncovered itinerary $P = \seq{A,B,C,..,X,Y,Z}$ in $\G_1$ iff they are also m.e. conductors on some uncovered itinerary in $\G_2$,
\item[(iv)] if $\seq{A,X,B}$ and $\seq{A,Y,B}$ are unshielded imperfect nonconductors in $\G_1$ and $\G_2$, then $X$ is an ancestor of $Y$ in $\G_1$ iff $X$ is an ancestor of $Y$ in $\G_2$,
\item[(v)] if $\seq{A,B,C}$ and $\seq{X,Y,Z}$ are m.e.\ conductors on an uncovered itinerary $P = \seq{A,B,C,..,X,Y,Z}$, and $\seq{A,M,Z}$ is an unshielded imperfect nonconductor (in $\G_1$ and $\G_2$), then $M$ is a descendant of $B$ in $\G_1$ iff $M$ is a descendant of $B$ in $\G_2$.
\end{enumerate}

\begin{figure*}[t]
\begin{center}
%\includegraphics[width=0.9\linewidth,page=1]{./fig2UstructG.pdf}
\includegraphics[scale=0.5]{./fig2UstructG.pdf}
\includegraphics[scale=0.5]{./fig2bUstructCPAG.pdf}
\caption{\small{Two Markov equivalent graphs (left) with $\seq{A,D,F}$ and $\seq{D,F,B}$ a pair of m.e.\ conductors on uncovered itinerary $\seq{A,D,F,B}$; (right) corresponding (maximally informative) CPAG}} \label{fig2Ustruct}
\end{center}
\end{figure*}

\subsection{Cyclic PAGs}
To characterize the (\textit{d}-separation) Markov equivalence class of a cyclic directed graph $\G$, denoted $MEC(\G)$, \cite{Richardson1996b_MEC} described an algorithm that created a set of exhaustive lists of instances in the graph matching one of the individual rules in the CET, above. Establishing Markov equivalence %between two cyclic directed graphs 
then boils down to comparing the lists constructed for each.

Later on, \cite{Richardson1996a_CCD} introduced a more intuitive graphical representation in the form of a (cyclic) partial ancestral graph that also captured enough elements to uniquely identify the equivalence class of a directed graph: 

\begin{dfn} \label{dfn:CPAG}
A graph $\cP$ is a \textbf{partial ancestral graph (PAG)} for directed (a)cyclic graph $\G$ with vertex set $\bfV$, iff
\begin{enumerate}
\item[(i)] there is an edge between vertices $A$ and $B$ iff $A$ and $B$ are d-connected given any subset $\bfW \subseteq \bfV \setminus \{A,B\}$,
\item[(ii)] If $A \tem B$ is in $\cP$, then in every graph in $MEC(\G)$, $A$ is ancestor of $B$
\item[(iii)] If $A \mea B$ is in $\cP$, then in every graph in $MEC(\G)$, $B$ is NOT an ancestor of $A$
\item[(iv)] if \mbox{$A \, \ast \!\! \relbar \!\!$ \underline{$\ast \, B \, \ast$} $ \!\! \relbar \!\! \ast \, C$} in $\cP$, then $B$ is ancestor of $A$ and/or $C$ in every $\G' \in MEC(\G)$,
\item[(v)] if \mbox{$A \! \relbar \!$ \dashuline{$\!\!\rightarrow \! B \! \leftarrow \!\!$} $ \! \relbar \! C$} in $\cP$, then $B$ is NOT a descendant of a common child of $A$ and $C$ in every $\G' \in MEC(\G)$,
\item[(vi)] any remaining edge mark not oriented in the above ways obtains a circle mark $\cem$ in $\cP$.
\end{enumerate}
We use the term \textbf{cyclic PAG (CPAG)} of a graph $\G$ to denote a PAG $\cP$ that captures invariant ancestral relations shared by all and only the collection of graphs $\{\G'\}$ in the Markov equivalence class of $\G$.
\end{dfn}
%In other words, a CPAG $\cP$ of a cyclic graph $\G$ is a PAG that uniquely determines the Markov equivalence class of $\G$.
In these rules the asterisk \mbox{$\ast \!\! \relbar$} mark on an edge is used as a meta symbol that represents any of the other marks $\{-,>,\circ\}$.
The solid underlining in rule (iv), indicating that the middle node is \textit{not} a collider between the other two, is superfluous and therefore often omitted from the graph $\cP$. The dashed underlining in rule (v), however, \textit{is} essential, and unique to cyclic graphs, and appears in the virtual \textit{v}-structures introduced in $\S$\ref{sub:introCMAG}. See Figure \ref{fig2Ustruct} for an example CPAG.

The CPAG has the same purpose and interpretation as the familiar PAG output by the well-known FCI algorithm (\cite{SGS2000, Zhang2008}), including circle marks $X \cem Y$ from rule (vi) to explicitly denote `not determined'. This can be either because the implied ancestral relation is not invariant between all members in the Markov equivalence class of $\G$, i.e.\ there are some graphs where $X$ is an ancestor of $Y$ and some where it is not (`can't know'), or because the relation \textit{is} invariant but we have not determined what it is yet (`don't know'). 
As a result, a graph $\G$ can correspond to different CPAGs $\cP$ that differ in level of completeness. 
In this paper we are not concerned with obtaining the (unique) \textit{maximally informative} CPAG, but instead settle for 
any \textit{Markov complete} CPAG that represents a unique (\textit{d}-separation) Markov equivalence class.



%The CPAG is similar in purpose and interpretation to the familiar PAG output by the well-known FCI algorithm (\cite{SGS2000, Zhang2008}), including circle marks to explicitly denote non-committed or non-invariant edge marks, with the only addition being the `dashed-underlining' for certain v-structures in condition (v) of the CPAG definition. 


\subsection{CPAG-from-Graph Algorithm} \label{sub:2.5-CPAGfromG}

Using the CPAG definition above we now describe an algorithm by \cite{Richardson1996c_DCCS} that takes as input a (possibly cyclic) directed graph $\G$ and outputs a CPAG $\cP$ such that two graphs $\G_1$ and $\G_2$ are Markov equivalent iff the algorithm outputs the same CPAG for both. In other words, the algorithm is \textit{d-separation complete}.

\begin{enumerate}
\item[(a)] form the complete undirected graph $\cP$ with all circle edges $\cec$, and then for every edge $A \cec B$ in $\cP$, if $A$ is \textit{d}-separated from $B$ given $\bfC = An(\{A,B\}) \setminus \{A,B\}$ then remove edge $A \cec B$ from $\cP$ and record $\bfC$ in $Sepset(A,B)$ and $Sepset(B,A)$,
\item[(b)] for each unshielded triple $A \mem B \mem C$ in $\cP$, orient $A \tea B \aet C$ if $B \notin Sepset(A,C)$,
\item[(c)] for each triple $\seq{A,X,Y}$ such that $X \mem Y$ in $\cP$, $A$ is not adjacent to $X$ or $Y$ in $\cP$, $X \notin Sepset(A,Y)$, then if $A$ and $X$ are d-connected given $Sepset(A,Y)$, then orient $X \aet Y$,
\item[(d)] for each unshielded triple $A \tea B \aet C$ in $\cP$, if $A$ and $C$ are \textit{d}-separated given a set $\bfR$, then orient \mbox{$A \! \relbar \!$ \dashuline{$\!\!\rightarrow \! B \! \leftarrow \!\!$} $ \! \relbar \! C$} in $\cP$ and record $\bfR$ in $SupSepset\seq{A,B,C}$ (and $SupSepset\seq{C,B,A}$),
\item[(e)] for each quadruple $\seq{A,B,C,D}$, if \mbox{$A \! \relbar \!$ \dashuline{$\!\!\rightarrow \! B \! \leftarrow \!\!$} $ \! \relbar \! C$} in $\cP$, \mbox{$A \tea D \aet C$} or \mbox{$A \! \relbar \!$ \dashuline{$\!\!\rightarrow \! D \! \leftarrow \!\!$} $ \! \relbar \! C$} in $\cP$, $B$ and $D$ are adjacent in $\cP$, then if $D \in SupSepset\seq{A,B,C}$ then orient \mbox{$B \met D$}, otherwise orient $B \tea D$ in $\cP$,
\item[(f)] for each quadruple $\seq{A,B,C,D}$, if \mbox{$A \! \relbar \!$ \dashuline{$\!\!\rightarrow \! B \! \leftarrow \!\!$} $ \! \relbar \! C$} in $\cP$, and $D$ is not adjacent to both $A$ and $C$ in $\cP$, then if $A$ and $C$ are \textit{d}-connected given $SupSepset\seq{A,B,C} \cup \{D\}$, then orient $B \tea D$.
\end{enumerate}

The algorithm has complexity $O(N^7)$, and is \textit{d}-separation complete:

\textbf{Theorem 2} in \citep{Richardson1996c_DCCS}: For two graphs $\G_1$ and $\G_2$, the CPAG-from-Graph algorithm outputs corresponding CPAGs $\cP_1$ and $\cP_2$ that are identical iff $\G_1$ and $\G_2$ are \textit{d}-separation equivalent. %, i.e.\ $\cP_1 = \cP_2$ iff $G_1 \in MEC(\G_2$ and v.v.

Actually, the theorem was formulated for the CCD algorithm \citep{Richardson1996a_CCD} for obtaining a CPAG from (oracle) independence information, but the two are so similar that the proof automatically carries over to the CPAG-from-Graph algorithm. 
The algorithm is an improvement by a factor $O(N^2)$ on the earlier list-based Cyclic Classification algorithm in \citep[$\S5.4$]{Richardson1996b_MEC}. 

% ========================================
\section{An ancestral perspective on the CET} \label{sec:3-NewCET}
% ========================================
On reflection of the characterization of Markov equivalence between cyclic graphs obtained, one may note that the rather daunting definitions and terminology in the CET seem to contrast quite sharply with the apparent simplicity of the actual invariant features contained in the CPAG. At the same time complicated again by the fact that some of these `invariant features' like edges in the CPAG are not actually invariant in the underlying graph at all. 

Furthermore, there is no clear match from some rules in the CET to specific invariant features in the CPAG. In particular the `mutually exclusive conductors on an uncovered itinerary'\footnote{Actually this term is a bit of a misnomer, as the two conductors need not be mutually exclusive when there is an induced virtual edge along the uncovered itinerary connecting the two.} in rule CET-(iii) are never explicitly recorded, even though they can of course be inferred from the CPAG afterwards.


A natural question, inspired by the familiar DAG-MAG-PAG triad for acyclic graphs, would be whether it might make sense to also consider an intermediate ancestral stage for cyclic graphs.

In this section we answer that question with an emphatic: yes!  We first introduce the CMAG as the cyclic analogue to the (acyclic) maximal ancestral graph \citep{RichardsonS2002}, and rephrase the CET in terms of ancestral graphs. This results in a simplified set of rules that each are in direct correspondence with invariant features in the CPAG. In the next section we will show that this approach also leads to an efficient procedure to establish Markov equivalence that no longer needs to rely on \textit{d}-separation tests.


\subsection{Introducing the CMAG} \label{sub:introCMAG}

In keeping with the spirit of regular (acyclic) maximal ancestral graphs, we will conceptually define a cyclic MAG as: 

\begin{dfn} \label{dfn:CMAG}
The \textbf{cyclic maximal ancestral graph (CMAG)} $\M$ corresponding to (cyclic) directed graph $\G$ over set of vertices $\bfV$ is a graph where:
\begin{enumerate} [label=(\roman*)]
\item there is an edge between every distinct pair of vertices $\{X,Y\}$ iff they cannot be \textit{d}-separated by any subset of $\bfV \setminus \{X,Y\}$ in $\G$, 
\item there is a tail mark $X \tem Y$ at vertex $X$ on the edge to $Y$ iff there exists a directed path from $X$ to $Y$ in $\G$, otherwise there is an arrowhead mark $X \aem Y$, 
\item every v-structure $X \tea Z \aet Y$ in $\M$ where $Z$ is not a descendant of a common child of $X$ and $Y$ in $\G$ obtains a dashed underline \mbox{$X \! \relbar \!$ \dashuline{$\!\!\rightarrow \! Z \! \leftarrow \!\!$} $ \! \relbar \! Y$}.
%\item every v-structure $X \tea Z \aet Y$ in $\M$ where $Z$ is not a (descendant of a) common child of $X$ and $Y$ in $\G$ obtains a dashed underline joining the two arrowheads at $Z$.
\end{enumerate}
The `dashed-underlined' triples in a CMAG are referred to as \textbf{virtual v-structures}.
\end{dfn}
%\mbox{$X \! \relbar \!$ \dashuline{$\!\!\rightarrow \! Z \! \leftarrow \!\!$} $ \! \relbar \! Y$}

With this definition, a CPAG becomes a straightforward collection of invariant edges and edge marks (rather than `ancestral relations') shared by all and only the CMAGs corresponding to graphs in the same Markov equivalence class. 

The `virtual' in the dashed-underlined \textit{v}-structures from rule (iii)  emphasises that they appear like regular \textit{v}-structures in the CMAG, but look and behave differently in the underlying directed graph $\G$. They are a direct consequence of rule (v) in Def.\ \ref{dfn:CPAG}, and correspond to unshielded imperfect nonconductors in $\G$, that are unique to cyclic graphs.
%that are a tell-tale sign of the presence of cycles.

\subsection{Virtual collider triples}
Having brought out the CMAG we can make a straightforward mapping from elements in the CET to their ancestral counterpart: (virtual) adjacencies become edges, itineraries become paths, unshielded conductors become standard unshielded noncolliders, unshielded (perfect) nonconductors become v-structures, and unshielded imperfect nonconductors become virtual v-strucutures.

That only leaves the `mutually exclusive conductors w.r.t.\ an uncovered itinerary'. For that we note that these only appear in the CPAG as the invariant arcs into a cycle, oriented in step (c) of the CPAG-from-Graph algorithm.
In other words, from an ancestral perspective it is not about the conductor triples at the beginning and end of the uncovered itinerary, but only about the first and last edge along the corresponding path in the CMAG.

This brings us to the following definition:

\begin{dfn} \label{dfn:u-struct}
In a CMAG $\M$ for directed graph $\G$, a quadruple of distinct nodes $\seq{X,Z,Z',Y}$ is a \textbf{u-structure} if $X \tea Z$ and $Z' \aet Y$ are in $\M$, $Z' \in SCC_{\G}(Z)$, and there is an uncovered path $\seq{X,Z,..,Z',Y}$ in $\M$ where all intermediate nodes are also in $SCC_{\G}(Z)$.
\end{dfn}
The term \textit{u}-structure reflects the fact that it is similar to a \textit{v}-structure, but with the central collider node replaced by an uncovered path through a strongly connected component. Note that SCCs in $\G$ correspond to nodes in (maximal) connected undirected subgraphs in $\M$, as each node in an SCC is ancestor of all other nodes in the same SCC.

There is a straightforward connection between \textit{u}-structures and the `m.e.\ conductors w.r.t.\ an uncovered itinerary' from Definition \ref{dfn:me-cond}:
\begin{lem} \label{lem:me=ustruct}
  For a directed graph $\G$ and corresponding CMAG $\M$, there is a u-structure $\seq{X,Z,Z',Y}$ in $\M$ iff there is an uncovered itinerary $\pi = \seq{X,Z,U,..,U',Z',Y}$ in $\G$, possibly with $Z=U'$ or $U = U'$, where $\seq{X,Z,U}$ and $\seq{U',Z',Y}$ are a pair of m.e.\ conductors w.r.t.\ the uncovered itinerary $\pi$ in $\G$.
\end{lem}
(For proof details for this and other results in the rest of this article, see supplement.)
%\begin{proof}
%By definition \ref{dfn:u-struct} a \textit{u}-structure $\seq{X,Z,Z',Y}$ implies the existence of an uncovered path $\pi = \seq{X,Z,U_1,..,U_k,Z',Y}$ (possibly with $U_1 = U_k$ or $U_1 = Z', U_k = Z$) between nonadjacent $X$ and $Y$ in $\M$, corresponding to an uncovered itinerary in $\G$ where all nodes $\{Z,Z',U_1,..,U_k\}$ are ancestors of each other, but not of $X$ or $Y$, which implies $\seq{X,Z,U_1}$ and $\seq{U_k,Z',Y}$ are a pair of m.e.\ conductors w.r.t.\ the uncovered itinerary $\pi$ in $\G$.
%
%Conversely, if $\seq{X,Z,U}$ and $\seq{U',Z',Y}$ are a pair of m.e.\ conductors w.r.t.\ an uncovered itinerary $\pi = \seq{X,Z,U,..,U',Z',Y}$ in $\G$, then $\pi$ is a also an uncovered path $\seq{X,Z,..,Z',Y}$ in $\M$, where all intermediate nodes are ancestor of each other, and so $\{Z,U,..,U',Z'\} \subset SCC(Z)$, but not of $X$ or $Y$, and so $X \tea Z$ and $Z' \aet Y$in $\M$, which by definition \ref{dfn:u-struct} implies $\seq{X,Z,Z',Y}$  is a \textit{u}-structure.
%\end{proof}

%Before we can go on to construct the CPAG from the CMAG obtained above, we need a way to identify the so-called \textit{u-structures}. These do not play a role in the CMAG, and (contrary to virtual \textit{v}-structures) are not even marked explicitly in the CPAG, but they \textit{are} needed to orient certain invariant edges in the CPAG corresponding to rules (iv) and (v). 

Crucially, in the CMAG or CPAG we do not actually record the \textit{u}-structure explicitly. In fact, the only elements of a \textit{u}-structure that need to be oriented in the CPAG are the first and last edge \textit{into} the strongly connected component (cf. step (c) of the CPAG-from-Graph algorithm, \S\ref{sub:2.5-CPAGfromG}).


As a result, we do not  have to identify the full quadruple $\seq{X,Z,Z',Y}$ of each \textit{u}-structure, but only if an edge $X - Z$ is part of \textit{some} u-structure pattern. For that, we can rely on the following result:

%, significantly reducing the algorithmic complexity. However, this still means we have to account for the existence of an `uncovered itinerary'. 

\begin{lem} \label{lem:u-struct-undir}
In a CMAG $\M$, a pair of nodes $\seq{X,Z}$ is part of a u-structure $\seq{X,Z,Z',Y}$ with a node $Y \in \bfY \subseteq pa(SCC(Z)) \setminus adj(\{X,Z\})$, iff $X \in pa(Z)$, and $X$ and $Y$ are connected in the undirected subgraph over $((SCC(Z) \setminus adj(X)) \cup \{X,Z\} \cup \bfY$.
\end{lem}
%\begin{proof}
%The given implies the existence of some path from $X$ via adjacent nodes in the undirected subgraph to some node from $\bfY$. Let $Y$ be the first 
%node from $\bfY$ encountered along this path, then $\seq{X,Z_1,..,Z_k,Y}$ is a path over distinct nodes where all $Z_i \in SCC(Z)$ are ancestors of each other, but not of $X$ or $Y$.
%
%If the path $\seq{X,Z_1,..,Z_k,Y}$ is not uncovered, then some subsequence  $\seq{X,U_1,..,U_m,Y}$ with $\{U_1,..,U_m\} \subset \{Z_1,..,Z_k\}$ can be chosen so that $\seq{X,U_1,..,U_m,Y}$ is an uncovered path in the unoriented subgraph (see e.g.\ Lemma B.1 in \citep{Zhang2008}). 
%Furthermore, as all nodes adjacent to $X$ in $\M$ are excluded from this subgraph with the exception of $Z$, it means that $Z = U_1 = Z_1$. 
%We also know that $m \geq 2$, as all $Y \in \bfY$ were taken not to be adjacent to $Z$, so $Z' = U_m \neq Z$.
%
%Finally, as all $U_i \in SCC(Z)$ are ancestors of each other, but not of $X$ or $Y$, it also means that each consecutive triple along the path is a noncollider in $\M$, and so in accordance with Def.<9> $\seq{X,Z,Z',Y}$ is a u-structure.
%\end{proof}

This significantly reduces the complexity of establishing Markov equivalence later on, as it means we only need to search over triples rather than quadruples in the CMAG. More importantly, however, is that this result suggests that virtual \textit{v}-structures and \textit{u}-structures can actually be seen as two manifestations of the same invariant element, which in turn will greatly simplify the CET.

\begin{dfn} \label{dfn:virtual-col-triple}
In a CMAG $\M$, a triple of distinct nodes $\seq{X,Z,Y}$ is a \textbf{virtual collider triple} iff $\seq{X,Z,Y}$ is a virtual \textit{v}-structure, or there is some $Z' \in SCC(Z)$, such that either $\seq{X,Z,Z',Y}$ or $\seq{X,Z',Z,Y}$ is a \textit{u}-structure.
\end{dfn}
Intuitively, a virtual collider triple $\seq{X,Z,Y}$ implies that $X$ and $Y$ are connected by an uncovered itinerary via nodes in $SCC(Z)$ that \textit{identifiably} contains one or more virtual edges. The strongly connected component of $Z$ fulfils the role of collider in $X \tea SCC(Z) \aet Y$, and the \textit{virtual} emphasises  there is no `real' collider triple $X \tea Z \aet Y$ in the underlying directed graph. 



\begin{figure}[h]
  \centering
  \includegraphics[width=0.9\linewidth,page=1]{fig3CETRule4.pdf}
  \caption{\small Example CET orientation rule (iv) on virtual collider triples $\seq{A,D,B}$ and $\seq{A,E,B}$ for invariant edge $D \tea E$, with virtual edges as dashed grey arcs.} 
  \label{fig3CET-4}
\end{figure}

\subsection{A new CET}
We are now ready to restate the Cyclic Equivalence Theorem in terms of CMAGs:

\begin{thm} \label{thm:CET-CMAG}
Two CMAGs $\M_1$ and $\M_2$ corresponding to cyclic directed graphs $\G_1$ resp.\ $\G_2$ are Markov equivalent iff
\begin{enumerate}  % [label=(\roman*)]
\item[(i)] they have the same skeleton,
\item[(ii)] they have the same \textit{v}-structures,
\item[(iii)] they have the same virtual collider triples,
\item[(iv)] if $\seq{A,B,C}$ and $\seq{A,D,C}$ are virtual collider triples, then $B$ is an ancestor of $D$ in $\M_1$ iff $B$ is an ancestor of $D$ in $\M_2$.
%\item[(v)] if $\seq{A,B,C}$ is a virtual v-structure, and either $\seq{A,D,D',C}$ or $\seq{A,D',D,C}$ is a u-structure, then $D$ ($B$) is an ancestor of $B$ ($D$) in $\M_1$ iff $D$ ($B$) is an ancestor of $B$ ($D$) in $\M_2$.
%\item[(ii.b)] they have the same virtual v-structures,
%\item[(iii)] they have the same u-structures,
%\item[(iv)] if $\seq{A,B,C}$ and $\seq{A,D,C}$ are virtual v-structures, then $D$ ($B$) is an ancestor of $B$ ($D$) in $\M_1$ iff $D$ ($B$) is an ancestor of $B$ ($D$) in $\M_2$.
%\item[(v)] if $\seq{A,B,C}$ is a virtual v-structure, and either $\seq{A,D,D',C}$ or $\seq{A,D',D,C}$ is a u-structure, then $D$ ($B$) is an ancestor of $B$ ($D$) in $\M_1$ iff $D$ ($B$) is an ancestor of $B$ ($D$) in $\M_2$.
\end{enumerate}
%\begin{proof}
%We will show that in terms of the CPAG the first 5 rules are equivalent to the first 5 rules in the original CET, and that the last rule is sound and implies the last rule in the original CET, which means the combined set of rules is sound and sufficient to ensure Markov equivalence.
%
%(i) By Lemma  \ref{lem:skeleton} two nodes in a CMAG $\M$ are adjacent iff they are \textit{p}-adjacent in the underlying cyclic graph $\G$, and so rule (i) is equivalent between the two CETs.
%
%(ii) By defs.\ \ref{dfn:non+conductor} and \ref{dfn:im+perfect noncond}, an unshielded triple $\seq{A.B,C}$ in a CPAG is either a conductor, unshielded perfect nonconductor, or an unshielded imperfect nonconductor. Therefore (ii).a+(ii).b in the original CET are equivalent to `have the same unshielded perfect and imperfect nonconductors'. By def.\ref{dfn:non+conductor} a nonconductor in $\G$ is a \textit{v}-structure in the CMAG $\M$, and by def.\ref{dfn:CMAG} the subset of \textit{imperfect} nonconductors is equivalent to a virtual \textit{v}-structure. Therefore together rules (ii).a + (ii).b are equivalent between the two CETs.
%
%(iii) By Lemma \ref{lem:me=ustruct} if the original rule (iii) applies to $\seq{A,B,C,..,X,Y,Z}$ in $\G$, then this implies a \textit{u}-structure $\seq{A,B,Y,Z}$ in $\M$. The reverse is not automatically implied as it involves the two extra nodes $C$ and $X$. However, in terms of the CPAG-from-Graph procedure (section \ref{sub:2.5-CPAGfromG}), in step (c) only the first and last edge (arcs $A \tea B$ and $Y \aet Z$) actually result in orientations in the CPAG, which implies that the original CET rule (iii) does not rely on the exact form of the itinerary, but only on the \textit{existence} of some uncovered itinerary. Therefore in terms of the implication for the CPAG, both rules (iii) are equivalent.
%
%(iv) In the CPAG-from-Graph algorithm this rule only implies an orientation in the CPAG if there is an edge between $B$ and $D$. From the symmetry between the pair of unshielded imperfect nonconductors in the original rule (iv) it directly follows that the orientation holds for both edge marks on the edge $B - D$, and is therefore equivalent to the new rule (iv) for the case of two virtual v-structures. 
%
%(v) As for the previous, this rule only implies an orientation in the CPAG if there is an edge between $B$ and $D$. Clearly the triggering conditions for both versions of rule (v) are equivalent (overlapping virtual \textit{v}-structure and \textit{u}-structure), and so the new rule implies the old orientation $B \tea D$ in step (f) of the original CPAG-from-Graph algorithm.
%
%That leaves two other cases: $B \aet D$, and $B \tet D$ We now show that these are also implied by other rules for Markov equivalent graphs. Note that both $A$ and $C$ are nondescendants of all other nodes involved in the rule.
%
%Case 1, $B \aet D$: if both $A$ and $C$ have an edge to $D$, then $\seq{A,D,C}$ would be a virtual v-structure, otherwise if it was a real v-structure, then $B$ would be a descendant of a common child of $A$ and $C$, contrary the given that $\seq{A,B,C}$ is a virtual \textit{v}-structure. But if both $\seq{A,B,C}$ and $\seq{A,D,C}$ are virtual \textit{v}-strucutures, then they would also satisfy rule (iv) which implies the edge $B \aet D$ would already be oriented. 
%If NOT both $A$ and $C$ have an edge to $D$, then either $A \tea B \aet D$ or $C \tea B \aet D$ would be a v-structure in $\M$, and so be oriented as $B \aet D$ by rule (ii.a)
%
%Case 2, $B \tet D$: similar to case 1, if both $A$ and $C$ have an edge to $D$, then this would again satisfy rule (iv) which would already imply the orientation $B \tet D$. 
%If not, then firstly the original rule (v) would already imply the invariant edge mark $B \met D$, as then $B$ is indeed a descendant of $D$ in $\G$. But then again similar to case 1, if NOT both $A$ and $C$ have an edge to $D$, then either $A \tea B \tet D$ or $C \tea B \tet D$ would be an unshielded noncollider triple in $\M$. Therefore we know $B$ is not a non-ancestor of $D$ (otherwise it would be a \textit{v}-structure oriented by rule (ii.a), and so the only other option is that $B$ must be an ancestor of $D$, i.e.\ $\B \tem D$. 
%
%Together this implies that for all cases that satisfy rule (v) the ancestral relations for both sides of the edge between $B$ and $D$ in $\M$ are invariant between all graphs in the same Markov equivalence class, and therefore correctly oriented by the new CET rule (v).
%
%As a result: all rules in the ancestral CET are sound, and imply rules (i)-(v) in the original CET, which means the combined set of rules is sound and sufficient to ensure Markov equivalence.
%\end{proof}
\end{thm}
%The numbering matches the corresponding rule(s) in the original CET. 
Each rule in this ancestral CET can be linked directly to specific invariant elements in the CPAG: rule (i) to the edges in the CPAG, (ii) to all \textit{v}-structures, (iii) to remaining invariant arcs into strongly connected components (incl. under-dashed marks for virtual \textit{v}-structures), and (iv) to invariant edges within or between (identifiable) cycle components. 
 
Comparing to the original CET in $\S$\ref{sub:orgCET}, we can see that the ancestral formulation greatly simplifies the Markov equivalence characterization, leading to two fewer rules and only requiring (collider) triples.

An interesting observation is that in the acyclic case going from DAGs to MAGs (to allow for unobserved confounders) implied going from `(unshielded) collider triples' (\textit{v}-structures) to `collider triples with order' in the characterization of Markov equivalence between graphs \citep{Ali++2009,ClaassenB2022}. Given that analogy we conjecture that for the cyclic case allowing for latent confounders can similarly be accomplished by extending to `(virtual) collider triples with order'. 


%
%Also note that - contrary to virtual v-structures - the u-structures (or m.e. conductors) do \textit{not} explicitly exclude descendants of a common child of $X$ and $Y$. This is due to the fact that a u-structure also captures another telltale sign of the presence of a cycle, with implications for extensions to maximally informative CPAGs, but we will leave that aspect for future work.

%WRONG: claim in Rich96 (Discovering cyclic causal structure), closing remark p19. It is NOT true that there is an acyclic graph that is d-sep equivalent IFF step c-f in CPAG-from-Graph algorithm do not trigger. It IS true that IF they trigger then there is no such DAG, but that does NOT imply 'only if'!
%Counter example: chordless 4 cycle.
%
%WRONG2: it also claims that step c is not entailed by any DAG, but this is wrong as the standard near-Y structure will make an orientation in step c, even though it is a perfect DAG.


% ========================================
\section{Establishing Markov Equivalence for cyclic graphs} 
\label{sec:4-MarkovEq}
% ========================================
We now show that with the intermediate CMAG representation we can derive a consistent CPAG that uniquely defines the equivalence class of a cyclic directed graph without the need for any \textit{d}-separation tests. 
The resulting algorithm is extremely fast, and allows to determine Markov equivalence between graphs by directly comparing the output CPAGs.

\subsection{Obtaining the CMAG}

To capture the first rule of the new CET, we need to obtain the skeleton of the CPAG. To avoid the \textit{d}-separation tests in step (a) of the CPAG-from-Graph algorithm in $\S$\ref{sub:2.5-CPAGfromG}, we can use the following result:

\begin{lem} \label{lem:skeleton}
In a CMAG $\M$ corresponding to directed graph $\G$, two variables $X$ and $Y$ are adjacent, iff $X$ and $Y$ are (virtually) adjacent in $\G$.
\end{lem}
%\begin{proof}
%Follows directly from Lemma 1 in (Rich.1997).
%\end{proof}

It implies we can read off the CMAG skeleton directly from the graph $\G$, by starting from the skeleton of $\G$, and adding an edge between $X$ and $Y$ for every \textit{v}-structure $X \tea Z \aet Y$ in $\G$ with $Y \in SCC(Z)$.

It does mean that we first need to partition the vertices in the graph into the set of strongly connected components. This can be achieved in time linear in the number of vertices and edges $O(N d)$ using e.g. Tarjan's algorithm \citep{Tarjan1972}\footnote{Actually, we use a modified version of Tarjan's algorithm that also tracks ancestral relations in one go. For details on this and all other algorithms used in the paper, see source code available at \url{https://github.com/tomc-ghub/CET_uai2023}}.
%\footnote{\url{https://github.com/tomc-ghub/CET_uai2023}}

Subsequent orientations of edges in $\M$ follow orientations in $\G$, where edges between nodes in the same $SCC$ become undirected edges, signifying they are all ancestor of each other.
Induced edges between nodes in the same cycle also become undirected, and induced edges by a triple $X \tea Z \aet Y$ in $\G$ with $X \notin SCC(Z)$ become $X \tea Y$.

Alternatively, we can process each node $X$ in $\G$ in turn, and draw undirected edges between all of its parents in the same cycle as $X$ (incl.\ $X$)  in $\M$, and add arcs from all remaining parents into the first set of parents (again incl.\ $X$), which is what we do in Algorithm 1, below.


\begin{algorithm}[h]
  \caption{Cyclic-Graph-to-CMAG}   \label{alg:CG-to-CMAG}
\begin{algorithmic}		% or algpseudocode?
   \STATE{\bfseries Input:} directed cyclic graph $\G$ over nodes $\bfV$
   \STATE{\bfseries Output:} CMAG $\M$, $SCC$s, 
   \STATE $SCC \gets Get\_StronglyConnComps(\G)$
   \STATE  \textit{part 1: CMAG rules (i) + (ii)}
   \FORALL{$X \in \bfV$}
   \STATE $\bfZ \gets pa_{\G}(X)$
   \STATE $\bfZ_{cyc} \gets \bfZ \cap SCC(X)$
   \STATE $\bfZ_{acy} \gets \bfZ \setminus \bfZ_{cyc}$
   \STATE add all arcs $\bfZ_{acy} \tea \bfZ_{cyc} \cup \{X\}$ to $\M$
   \STATE add all undirected edges $\bfZ_{cyc} \tet \bfZ_{cyc} \cup \{X\}$ to $\M$
   \ENDFOR
   \STATE  \textit{part 2: CMAG rule (iii)}
   \FORALL{$X \in \bfV: |SCC(X)|\geq 2$}
   \STATE $\bfZ \gets pa_{\M}(X)$
   \FORALL{non-adjacent pairs $\{Z_i,Z_j\} \subseteq \bfZ$}
	\STATE \textbf{if}   $\{Z_i,Z_j\} \nsubseteq adj_{\G}(X)$ \textbf{then}
	\STATE \textbf{~~~if}   $X \notin de_{\G}(ch_{\G}(Z_i) \cap ch_{\G}(Z_j)$  \textbf{then}
%    \STATE ~~~~~~mark virtual v-structure \mbox{$Z_i \! \relbar \!$ \dashuline{$\!\!\rightarrow \! X \! \leftarrow \!\!$} $ \! \relbar \! Z_j$} in $\M$
    \STATE ~~~~~~mark virtual v-structure $\seq{Z_i,X,Z_j}$ in $\M$
   \ENDFOR
   \ENDFOR
\end{algorithmic}
\end{algorithm}

The second part of Algorithm \ref{alg:CG-to-CMAG} simply involves checking all \textit{v}-structures in $\M$ with central collider node in a non-trivial SCC, and with at least one virtual edge in $\G$. Here we use the matrix of ancestral relations, constructed when identifying the SCCs at the start of the algorithm, to reduce the `descendant of' check in the second `if'-clause to constant time per node.

%The resulting graph $\M$ contains all ancestral relations from directed graph $\G$, but is not quite a proper CMAG yet, as it does not distinguish the virtual \textit{v}-structures from rule (iii) in Definition \ref{dfn:CMAG}. For that we use the second step:

%\begin{algorithm}[h]
%  \caption{(C)MAG-to-CMAG}   \label{alg:MAG-to-CMAG}
%\begin{algorithmic}		% or algpseudocode?
%   \STATE{\bfseries Input:} directed cyclic graph $\G$ over nodes $\bfV$, (C)MAG $\M$
%   \STATE{\bfseries Output:} CMAG $\M$, 
%%   \STATE $\C \gets \M$
%   \FORALL{$X \in \bfV$}
%   \STATE $\bfZ \gets pa_{\M}(X)$
%   \FORALL{non-adjacent pairs $\{Z_i,Z_j\} \subseteq \bfZ$}
%	\STATE \textbf{if}   $\{Z_i,Z_j\} \nsubseteq adj_{\G}(X)$ \textbf{then}
%	\STATE \textbf{~~~if}   $X \notin de_{\G}(ch_{\G}(Z_i) \cap ch_{\G}(Z_j)$  \textbf{then}
%%    \STATE ~~~~~~mark virtual v-structure \mbox{$Z_i \! \relbar \!$ \dashuline{$\!\!\rightarrow \! X \! \leftarrow \!\!$} $ \! \relbar \! Z_j$} in $\M$
%    \STATE ~~~~~~mark virtual v-structure $\seq{Z_i,X,Z_j}$ in $\M$
%   \ENDFOR
%   \ENDFOR
%\end{algorithmic}
%\end{algorithm}


\subsection{Constructing the CPAG}

% \subsection{Identifying u-structures ? (merge)}

Before we can go on to construct the CPAG from the CMAG $\M$ obtained above, we still need to recognise the virtual collider triples corresponding to so-called \textit{u}-structures. These are not marked explicitly in the CMAG (contrary to virtual \textit{v}-structures), but they \textit{are} needed to orient certain invariant edges in the CPAG corresponding to rules (iii) and (iv) in Theorem \ref{thm:CET-CMAG}. Fortunately, for that we can rely on Lemma \ref{lem:u-struct-undir}, where the fact that we only need to consider straightforward `connected undirected subgraphs' means the complexity of this step scales linearly with the number of edges in the subgraph.

 
%Before we can go on to construct the CPAG from the CMAG obtained above, we still need to identify the so-called \textit{u-structures}. These do not play a role in the CMAG, and (contrary to virtual \textit{v}-structures) are not even marked explicitly in the CPAG, but they \textit{are} needed to orient certain invariant edges in the CPAG corresponding to rules (iv) and (v). 
 
%As a result, we do not even have to identify the full quadruple $\seq{X,Z,Z',Y}$ of each \textit{u}-structure, but only if an edge $X - Z$ is part of \textit{some} u-structure pattern, significantly reducing the algorithmic complexity. However, this still means we have to account for the existence of an `uncovered itinerary'. Fortunately for that, in the context of a CMAG, we can rely on the following result:

%\begin{lem} In a CMAG $\M$, a pair of nodes $\seq{X,Z}$ is part of a u-structure $\seq{X,Z,Z',Y}$ with a node $Y \in \bfY \subseteq pa(SCC(Z)) \setminus adj(\{X,Z\})$, iff $X \in pa(Z)$, and $X$ and $Y$ are connected in the undirected subgraph over $((SCC(Z) \setminus adj(X)) \cup \{X,Z\} \cup \bfY$.
%\end{lem}
%\begin{proof}
%The given implies the existence of some path from $X$ via adjacent nodes in the undirected subgraph to some node from $\bfY$. Let $Y$ be the first 
%node from $\bfY$ encountered along this path, then $\seq{X,Z_1,..,Z_k,Y}$ is a path over distinct nodes where all $Z_i \in SCC(Z)$ are ancestors of each other, but not of $X$ or $Y$.
%
%If the path $\seq{X,Z_1,..,Z_k,Y}$ is not uncovered, then some subsequence  $\seq{X,U_1,..,U_m,Y}$ with $\{U_1,..,U_m\} \subset \{Z_1,..,Z_k\}$ can be chosen so that $\seq{X,U_1,..,U_m,Y}$ is an uncovered path in the unoriented subgraph (see e.g.\ Lemma B.1 in \citep{Zhang2008}). 
%Furthermore, as all nodes adjacent to $X$ in $\M$ are excluded from this subgraph with the exception of $Z$, it means that $Z = U_1 = Z_1$. 
%We also know that $m \geq 2$, as all $Y \in \bfY$ were taken not to be adjacent to $Z$, so $Z' = U_m \neq Z$.
%
%Finally, as all $U_i \in SCC(Z)$ are ancestors of each other, but not of $X$ or $Y$, it also means that each consecutive triple along the path is a noncollider in $\M$, and so in accordance with Def.<9> $\seq{X,Z,Z',Y}$ is a u-structure.
%\end{proof}


It also means that, in the construction of the CPAG, to cover invariant arcs from \textit{u}-structures, we only need to consider edges $X \tea Z$ in $\M$ that are not yet oriented in $\cP$, and where $Z$ is part of a nontrivial SCC (size $|SCC(Z)| \geq 2$), and the $\bfY$ in Lemma \ref{lem:u-struct-undir} are all other parents of $SCC(Z)$ that are not adjacent to $X$ and/or $Z$ in $\M$.
 Note that the arcs oriented thusly were previously captured by the exhaustive search in step (c) of the CPAG-from-Graph algorithm in section \ref{sub:2.5-CPAGfromG}.  

Finally, note that rule (iv) of our new CET in Theorem \ref{thm:CET-CMAG} applies to all virtual collider triples, but that in the construction of the CPAG, similar to the original CPAG-from-Graph algorithm, we only need to consider cases where at least one of them is a virtual \textit{v}-structure, as the case for two \textit{u}-structures will be superfluous as follows from the original CET in $\S$\ref{sub:orgCET}.

We can now bring these steps together in Algorithm \ref{alg:G-to-CPAG}.

\begin{algorithm}[h]
  \caption{Graph-to-CPAG}   \label{alg:G-to-CPAG}
\begin{algorithmic}		% or algpseudocode?
   \STATE{\bfseries Input:} directed cyclic graph $\G$ over nodes $\bfV$, 
   \STATE{\bfseries Output:} CPAG $\cP$, 
   \STATE $(\M,SCC) \gets$ \mbox{Cyclic-Graph-to-CMAG}$(\G)$ 
%   \STATE $\M \gets$ \mbox{MAG\_to\_CMAG}$(\G,\M')$  
   \STATE \textit{part 1: new-CET rules (i)-(iii)}
   \STATE $\cP \gets$ skeleton of $\M$ with all $\cec$ edges
   \STATE $\cP \gets$ copy all (virtual) \textit{v}-structures from $\M$ 
   \FORALL{$X \cec Z$ in $\cP$, $X \tea Z$ in $\M$, $|SCC(Z)| \geq 2$}
   \STATE \textbf{if}   $\exists \seq{X,Z,Z',Y}$ as \textit{u}-structure in $\M$ \textbf{then}
   \STATE ~~~orient $X \tea Z$ in $\cP$ ~~~~~~~~~~~~~~~ \COMMENT{\textit{Lemma \ref{lem:u-struct-undir}}}
   \ENDFOR
   \STATE \textit{part 2: new-CET rule (iv)}
   \FORALL{virtual \textit{v}-structures $\seq{X,Z,Y}$ in $\cP$}
   \FORALL{not fully oriented edges $Z \mem W$ in $\cP$}
   \IF{$\seq{X,W,Y}$ is virtual collider triple}
   \STATE copy edge $Z \mem W$ from $\M$ to $\cP$
   \ENDIF
%   \STATE $(Z,W)_{\M\tea\cP}$ for all virtual collider triples $\seq{X,W,Y}$
%   \STATE $(Z,W)_{\M\tea\cP}$ for all virtual \textit{v}-structures $\seq{X,W,Y}$
%   \STATE $(Z,W)_{\M\tea\cP}$ for all \textit{u}-structures $\seq{X,W,W',Y}$
   \ENDFOR
   \ENDFOR
\end{algorithmic}
\end{algorithm}

In practice we already copy invariant features to the CPAG while constructing the CMAG to improve efficiency. Note that the final output CPAG is \textit{d}-separation complete, but \textit{not} guaranteed to be identical to the CPAG from the original CPAG-from-Graph algorithm. This is because step (c) there contained an exhaustive search that also oriented certain arcs that are sound but not needed for the CET, but could also be obtained from subsequent implied orientation rules, similar to the PC/FCI algorithm. Similarly, the new algorithm can orient some edges in the last step that are not guaranteed to be found by the previous version. Therefore the CPAGs from the two algorithms cannot be compared directly against each other to establish Markov equivalence. However the main result remains the same: 

\begin{thm}
For two different cyclic directed graphs $\G_1$ and $\G_2$, let $\cP_1$ and $\cP_2$ be the corresponding CPAGs output by algorithm \ref{alg:G-to-CPAG}. Then $\G_1$ is Markov equivalent to $\G_2$ iff $\cP_1 = \cP_2$.
\end{thm}
%\begin{proof}
%Soundness of the algorithm follows from Theorem \ref{thm:CET-CMAG}, in combination with the fact that each orientation has a direct match to an invariant feature in the CET rules and is therefore sound. As the algorithm processes each rule exhaustively, this guarantees the output is a valid CPAG. 
%
%Remainder of the proof strategy carries over directly from Theorem 2 in \cite{Richardson1996c_DCCS}: if any of the orientations triggers in one graph but not the other, then there must be a difference in one or more d-separation statement(s) meaning they are not Markov equivalent. We already showed in the proof of Theorem 1 that CET rules (i)-(iv) were equivalent between the two versions, which (again by the proof of the original Theorem 2) ensures that for two Markov equivalent graphs $\cP_1$ and $\cP_2$ have the same skeleton, v-structures, virtual v-structures, and u-structures, and the same edges between virtual v-structures.
%The final orientation step, corresponding to CET-(v), has a slightly stronger implication than the original rule (v), but still cannot introduce or destroy a (virtual) v- or u-structure, and so if it triggers in one graph, then it triggers in the other graph. Therefore, if $\cP_1$ and $\cP_2$ differ on CET-(v), then $\G_1$ and $\G_2$ must differ on an invariant edge, and so are not Markov equivalent.
%\end{proof}


\subsection{Computational complexity} \label{sub:complexity}
The scaling behaviour of Algorithm \ref{alg:G-to-CPAG} depends primarily on the number of vertices $N$ and average node degree $d$ corresponding to $N*d$ edges in the graph. 

The first part of algorithm \ref{alg:CG-to-CMAG} requires order $O(N + N*d)$ steps to find the strongly connected components, followed by a loop over $N$ vertices comparing $d^2$ parents, so overall $O(N*d^2)$.
Similarly, the second part considers $d^2$ parents for $N$ nodes for $O(N*d^2)$ (provided the $Get\_StronglyConnComps$ step also tracks the ancestral matrix for constant-time descendant checks).
Subsequent steps initializing the skeleton and (virtual) v-structures are also $O(N*d^2)$.
Next, for the \textit{u}-structures we may need to loop over $O(N*d)$ edges and establish connectedness in a subgraph over at most $N$ nodes, which can be done in order $O(N*d)$ steps (similar to the SCC procedure) leading to overall $O(N^2*d^2)$.
Finally we need to loop over $N*d^2$ virtual \textit{v}-structures, considering links to $d$ other edges, while testing for connectedness order $O(N*d)$, giving a total of $O(N^2*d^3)$.

So overall worst case complexity scales with $O(N^2*d^3)$, or $O(N^5)$ for arbitrary density, which is a significant improvement over the $O(N^7)$ achieved by the current state-of-the-art CPAG-from-Graph algorithm.

In practice, even for large graphs there is typically only a relatively small number of cases to consider in the final steps, and so for both procedures the actual scaling behaviour is usually much better than this worst-case bound suggests, as evidenced by the next section.

% ========================================
\section{Experimental evaluation} \label{sec:5-ExpEval}
% ========================================
In order to evaluate the performance of the CPAG-from-graph procedure as a function of size and density of the graph we generate collections of random directed cyclic graphs and track both average and worst-case performance in terms of number of elementary operations and time.

Note that in generating the random cyclic graphs we introduced a few parameters to be able to tweak the number and type of cycles included, as for increasing size and density truly random cyclic graphs quickly tend to collapse into the `one big cycle' type, avoiding most of the intricacies from CET rules (iv) and (v) that relate to invariate edges between cycles; see section 1.1 in the supplement for details.

%\subsection{Generating random cyclic graphs}
%In contrast to the familiar acyclic graphs, in cyclic graphs there can be \textit{two} edges between each pair of nodes, corresponding to a total of $N(N-1)$ possible directed edges for graphs over $N$ nodes. However, in both the Erdos-Renyi model (all graphs with $n$ edges equally likely) and the Gilbert model (all edges appear with equal probability $p$), as density or size of the graph increases, the resulting graph is overwhelmingly likely to contain just one, big strongly connected component, with only a few other nodes on its periphery. As a key part of the CET is about invariant edges \textit{between} components (rules 4+5), just evaluating on arbitrary random graphs would likely lead to an incomplete or biased perspective. In addition, a number of challenges in finding the correct CPAG are related to sequences of connected two-cycles (see e.g.\ Figure \ref{fig3CET-4} ), which in larger fully random graphs are also exceedingly unlikely to appear.
%
%Therefore we tweak the random graph generating process to allow some control over the number and size of the strongly connected components. We introduce a 3-stage process parameterized by size $N$ and density $d$, as well as parameters $p_{two}$ for the proportion of two-cycles, and $p_{acy}$ and $p_{cyc}$ for the proportion of recursive resp.\ nonrecursive edges that remain:
%\begin{enumerate}
%\item randomly sample the required number of two-cycles, 
%\item add random arcs from lower to higher numbered nodes,
%\item add completely random arcs for the remaining edges.
%\end{enumerate}
%Afterwards a random permutation of the nodes is applied to ensure there is no implicit bias in the ordering.
%
%With this procedure, setting $[p_{two},p_{acy}, p_{cyc}] = [0,1,0]$ would lead to a random acyclic graph, whereas setting $[0.1,0.9,0]$ would lead to a random acyclic graph with some edges turned into two-cycles. Setting $[0,0,1]$ would lead to a fully random cyclic graph in the Erdos-Renyi model.
%In practice setting e.g.\ $[p_{two},p_{acy}, p_{cyc}] = [0.1,0.8,0.1]$ leads to a varied number and size of the strongly connected components for graphs of up to $N=200$ nodes with density $d=3.5$. For larger/higher density graphs the $p_{cyc}$ proportion should be reduced to avoid collapsing into the `one big cycle' trap. Additional implementation details can be found in the accompanying source code.

\subsection{Scaling behaviour}

\begin{figure}[h]
  \centering
  \includegraphics[width=1.0\linewidth]{fig4TimingResults.pdf}
  \caption{\small Log-log plot depicting scaling behaviour of original (red/magenta) and new CPAG algorithms (blue/green), as a function of size of the graph $N$, for two different densities $d\in \{3.0,5.0\}$. Solid lines indicate average performance over 100 instances, dashed lines the worst case encountered.} 
  \label{fig4PerfResults}
\end{figure}

Figure \ref{fig4PerfResults} shows the results for the two CPAG-from-graph procedures. As expected, the scaling behaviour of the new procedure in Algorithm \ref{alg:G-to-CPAG} is much more benign. 
%In addition, the original procedure spends the majority of its time in the relatively expensive d-separation tests (>99\%, evenly spread between steps a,c and f), whereas the new version is dominated by the much simpler 'find adjacent nodes' operation. 
As a result, for graphs of $N=200$ nodes with density $d=3.0$, the latter requires only about $0.05$ sec. on average to construct the CPAG, whereas the original version takes about $78$ sec.: a speed-up by 3 orders of magnitude.

In the supplement we see that the original CPAG-from-Graph procedure spends the vast majority of its time in the expensive \textit{d}-separation searches in stage (a) and (c), whereas for sparse graphs the new Graph-to-CPAG version spends roughly equal amounts in each phase. For denser graphs, the final stage in the latter starts to dominate, as expected from the complexity analysis in section \ref{sub:complexity}.

Finally, note that for both algorithms there is not much difference between average and worst-case scaling behaviour in the collection of randomly sampled graphs (around $1.5-2.0$ times more expensive for both versions), and both stay well below their theoretical worst-case limits.
The reason is that, in order to reach the dreaded `worst case' scenario, the graphs require very specific configurations that are extremely unlikely to occur in truly random graphs. As a result, a reassuring message of Figure \ref{fig4PerfResults} is that in practice the challenge of handling even (very) large cyclic directed graphs is likely to remain feasible in practice, despite the quite imposing theoretical worst-case limit.

\section{Discussion}

We presented a new, ancestral perspective on the Cyclic Equivalence Theorem for directed graphs that resulted in a fast and efficient procedure to obtain the CPAG from an arbitrary directed graph.

The resulting CPAGs can be compared directly to establish Markov equivalence between cyclic directed graphs, but so far we made no attempt to derive \textit{all} invariant features shared by all (and only) the CMAGs in the same equivalence class. In other words, we did not yet aim for the \textit{maximally informative} CPAG. As a result, not all identifiable cycles are guaranteed to appear in an easily recognisable form. Squeezing out all available information would likely entail a set of additional orientation propagation rules, similar to augmented FCI in \citep{Zhang2008}. 

The obtained efficiency of the Graph-to-CPAG procedure in algorithm \ref{alg:G-to-CPAG} also means it is fast enough to be a viable route for extending score-based greedy equivalence search algorithms like GES \citep{Chickering2002} towards cyclic graphs, similar to recent extensions for acyclic graphs in the presence of confounders \citep{ClaassenB2022}. 

However, we consider the most promising aspect of our results the significantly reduced conceptual complexity provided by the ancestral perspective. The new ancestral CET is notably simpler than the original version, and suggests a natural extension to cyclic models with confounders, analogous to that for MAGs.

Finally, the CMAG under \textit{d}-separation treats strongly connected components more similar to the nonlinear case under $\sigma$-separation \citep{MooijC2020}, which suggests they may be merged to handle arbitrary cyclic relationships in the near future. We hope this may encourage researchers to renew work towards extending available constraint-based algorithms towards sound and complete causal discovery in the presence of confounders, cycles, and selection bias.



%It may even be possible to simplify the CET further given the close similarity between the treatment of virtual \textit{v}-structures and \textit{u}-structures in both the CET rules and algorithm \ref{alg:G-to-CPAG}. Essentially virtual \textit{v}-structures are \textit{u}-structures with a single node mediating path, with the only difference that the latter do not explicitly include a test for `not a descendant of a common child'.

%Now bring it all together: find cyclic-expansion, expand CPC orientation rules, identify invariant cycle groups, do sound+complete procedure. Then expand to confounders in linear / d-separation case. Then combine with sigma-separation. 

%Possible applications to extend score-based greedy equivalence search \cite{Chickering2002, ClaassenB2022}.

%Orientation rules for completeness: 

%Recognising presence of cycles, mention? (nicely!)
%WRONG: CLAIM in Rich96 (Discovering cyclic causal structure), closing remark p19. It is NOT true that there is an acyclic graph that is d-sep equivalent IFF step c-f in CPAG-from-Graph algorithm trigger. It IS true that IF they trigger then there is no such DAG, but that does NOT imply 'only if'!
%Counter example: chordless 4 cycle.
%
%WRONG2: it also claims that step c is not entailed by any DAG, but this is wrong as the standard near-Y structure will make an orientation in step c, even though it is a perfect DAG.


%?

%\begin{contributions} % will be removed in pdf for initial submission 
%					  % (without ‘accepted’ option in \documentclass)
%                      % so you can already fill it to test with the
%                      % ‘accepted’ class option
%    Briefly list author contributions. 
%    This is a nice way of making clear who did what and to give proper credit.
%    This section is optional.
%
%    H.~Q.~Bovik conceived the idea and wrote the paper.
%    Coauthor One created the code.
%    Coauthor Two created the figures.
%\end{contributions}

%\begin{acknowledgements} % will be removed in pdf for initial submission,
%						 % (without ‘accepted’ option in \documentclass)
%                         % so you can already fill it to test with the
%                         % ‘accepted’ class option
%    Briefly acknowledge people and organizations here.
%
%    \emph{All} acknowledgements go in this section.
%\end{acknowledgements}

% References
\bibliography{uai2023-newCET}
\end{document}
