%\documentclass{uai2022} % for initial submission
\documentclass[accepted]{uai2022} % after acceptance, for a revised
                                    % version; also before submission to
                                    % see how the non-anonymous paper
                                    % would look like
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2022} % ptmx math instead of Computer
                                         % Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2022} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
% usepackage[american]{babel}
\usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{apalike}
%    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

% GPS packages
\usepackage{amsfonts}       % blackboard math symbols
\usepackage{amsmath}       % multi-line eqns 
\usepackage{algorithm}
\usepackage{algorithmic}
\usepackage{tabularx}
\usepackage{multirow}
% \usepackage{algorithm2e}	% ?

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example


% ==== GPS specific packages/macros:
% the following file contains most of the article specific commands and symbols
\include{TMacros}
% ==== end GPS specific packages/macros


\title{Greedy Equivalence Search in the Presence of Latent Confounders}

% Add authors
\author[1]{\href{mailto:<Tom.Claassen@ru.nl>?Subject=GPS-UAI2022}{Tom Claassen}{}}
\author[1]{\href{mailto:<g.bucur@cs.ru.nl>?Subject=GPS-UAI2022}{Ioan~Gabriel~Bucur}{}}
% Add affiliations after the authors
\affil[1]{%
    Institute for Computing and Information Sciences\\
    Radboud University\\
    Nijmegen, (The) Netherlands
  }
  
\begin{document}
\maketitle

\begin{abstract}
We investigate Greedy PAG Search (GPS) for score-based causal discovery  over equivalence classes, similar to the famous Greedy Equivalence Search algorithm, except now in the presence of latent confounders. It is based on a novel characterization of Markov equivalence classes for MAGs, that not only improves state-of-the-art identification of Markov equivalence between MAGs to linear time complexity for sparse graphs, but also allows for efficient traversal over equivalence classes in the space of all MAGs. The resulting GPS algorithm is evaluated against several existing alternatives and found to show promising performance, both in terms of speed and accuracy.
\end{abstract}

\section{Introduction} \label{secIntro}
%\section{Background and terminology} \label{secBackground}
Ever since the advent in the early 90's of modern, principled methods for causal discovery from observational data, there have been two main paradigms that have been widely employed: constraint-based and score-based methodologies. Both start from the assumption that there is some underlying causal structure, typically in the form of a directed acyclic graph (DAG), that is responsible for the observed data distribution. 
The first class of methods then search for (conditional) in/dependence constraints between variables in the data, and use this information in combination with certain orientation rules to reconstruct the output causal model.
Key assumptions include the \textit{causal Markov assumption}, essentially stating that the structure of the underlying graph induces independence constraints in the observed data according to the \textit{d}-separation criterion (see below), as well as the \textit{causal faithfulness assumption}, stating that these are also the only observable independencies in the data. 
Other simplifying model assumptions like acyclicity and causal sufficiency (no latent confounders) can also be employed. When causal sufficiency does not apply the target causal model can be represented as a (maximal) ancestral graph (MAG, see below).
The output then represents the so-called Markov equivalence class (MEC) of the underlying causal model, in the form of a partial ancestral graph (PAG) representing all causal graphs that satisfy the same independence model. 
Benchmark examples of algorithms in this tradition include PC and FCI \citep{SpirtesGS2000}, where the latter is sound and complete even in the presence of latent confounders and selection bias.

In contrast, score-based approaches define a metric that quantifies how well a certain graph structure captures the observed data, and then iteratively try to search for a graph that maximizes this score. The score is typically based on a (Bayesian) likelihood in combination with a penalty on model complexity, and usually assumes an underlying DAG structure with no unobserved confounders. A classic example is the K2 algorithm by \cite{CooperH1992}, 

In many cases, it is possible to choose a score in such a way that all graphs in the same equivalence class obtain the same score \citep{HeckermanGC1995}.
As there can be a huge number of graph instances in the same equivalence class, this opens up the possibility of significantly speeding up the search by moving between equivalence classes rather than between individual graphs. This was the motivation behind algorithms like GBPS \citep{SpirtesM1995}, and its famous successor GES (Greedy Equivalence Search) \citep{Chickering2002b}, as well as recent versions improving scaling behaviour and statistical efficiency \citep{Ramsey2017,Chickering2020}.
In practice, equivalence search significantly outperforms traditional graph based search methods, both in speed and accuracy.
Due to the global nature of the score, their output also tends to be more robust than that of their constraint-based counterparts.
Unfortunately, like PC, they also assume causal sufficiency, meaning that there is currently no available method that can employ the full potential of score-based equivalence search in the presence of latent confounders.
Addressing this gap is the focus of this article.


\textbf{Towards equivalence search for MAGs}

There have been several related score-based methods in recent years that try to go beyond the standard DAG search. For example \cite{TriantafillouT2016} consider the relative performance of constraint-based methods vs. MAG search using the BIC score for multivariate Gaussian distributions from \citep{RichardsonS2002}. Their GSMAG algorithm employed a greedy search over the space of MAGs, where at each step all possible single edge modifications were evaluated. Later results showed this could be improved by starting from the MMPC skeleton \cite{Tsirlis2018}. GSMAG was found to have promising performance, albeit at much greater running times. 

A different approach was taken by \cite{OgarrioSR2016}. They managed to circumvent the MAG equivalence search by exploiting the original GES to first do equivalence search in the space of DAGs, and then to add a post-processing step using a modification of FCI that started from the GES output in order to obtain the final PAG. The result was a hybrid method (GFCI, short for Greedy FCI)) that showed promising performance over either method separately, but did not exploit the potential of full PAG search. 

\cite{Bhattacharya2021} presented a radically different alternative that tackles the even wider class of ancestral ADMGs by exploiting differentiable algebraic constraints to turn causal discovery in a continuous optimization problem. 

In the meantime many transformational characterizations of MAGs have been developed, see e.g. \citep{Tian2012,ZhangS2012}, showing that we can reach all MAGs within the same equivalence class by a series of (covered) edge reversals to go from one MAG to the next where all are part of the same MEC. But as these characterizations are primarily concerned with transformations \textit{within} the same equivalence class, they are not easy to generalize into an orthogonal search strategy \textit{between} equivalence classes.

Our solution to this problem is based on a novel MEC characterization for MAGs that does not rely on complicated paths but on straightforward collider/noncollider triples. Any change to these triples implies a new MEC, which makes it easy to generate a collection of neighbouring MECs. In combination with an appropriate score this then forms the main engine in our Greedy PAG Search (GPS) algorithm for score-based equivalence search in the presence of latent confounders.

The rest of the article is organised as follows: section 2 introduces some basic concepts and terminology, section 3 describes the new characterization for Markov equivalence between MAGs, section 4 discusses how to use this for traversal between equivalence classes in the MAG space, ultimately leading to the GPS algorithm in section 5. Section 6 then shows the performance of GPS in practice compared to some state-of-the-art alternatives.

\section{Notation and terminology} \label{sec:notation}

A \textit{mixed graph} $\G$ is a graphical model that can contain three types of edges between pairs of nodes: directed ($\rightarrow$), bidirected ($\leftrightarrow$), and undirected ($\relbar$). %, where the end marks are known as tail `$-$' and arrowhead `$>$'. 
In a mixed graph, standard graph-theoretical notions, e.g.\ \textit{child/parent}, \textit{ancestor/descendant}, \textit{directed path, cycle}, still apply, with natural extension to sets. 
A vertex $z$ is a \textit{collider} on a path $\pi = \seq{\ldots,x,z,y,\ldots}$ if there are arrowheads at $z$ on both edges from $x$ and $y$, otherwise it is a \textit{noncollider}.  A triple $x - z - y$ on a path is \textit{unshielded} if $x$ and $y$ are not adjacent in $\G$. An unshielded collider is known as \textit{v-structure}.

A mixed graph $\G$ is \textit{ancestral} iff an arrowhead at $x$ on an edge to $y$ implies there is no directed path from $x$ to $y$ in $\G$, and there are no arrowheads at nodes with undirected edges. As a result, arrowhead marks can be read as `is not an ancestor of'.
In a mixed graph $\G$, a vertex $x$ is \textit{m-connected} to $y$ by a path $\pi$, relative to a set of vertices $Z$, iff every noncollider on $\pi$ is not in $Z$, and every collider on $\pi$ is an ancestor of $Z$. If there is no such path, then $x$ and $y$ are \textit{m-separated} by $Z$. 
An ancestral graph is \textit{maximal} (MAG) if for any two nonadjacent vertices there is a set that m-separates them.
A \textit{directed acyclic graph} (DAG) is a special kind of MAG, containing only $\rightarrow$ edges, for which \textit{m}-separation reduces to the standard \textit{d}-separation criterion.
For more details, see \citep{KollerF2009, SpirtesGS2000}.

A \textit{causal DAG} $\G_C$ is a directed acyclic graph where the arcs represent direct causal interactions \citep{Pearl2009}.
In general, the independence relations between observed variables in a causal DAG can be represented in the form of a MAG \citep{RichardsonS2002}. 
The (complete) partial ancestral graph (PAG) represents all invariant features that characterize the equivalence class $[\G]$ of such a MAG, with a tail `$-$' or arrowhead `$>$' end mark on an edge, iff it is invariant in all $[\G]$, otherwise a circle mark `$\circ$'. 


\section{Characterizing Markov equivalence classes} \label{sec:CharacterizeMEC}
In this section we introduce a modified characterization for the Markov equivalence class (MEC) of MAGs, that will form the basis for the equivalence search in the next section. It also leads to a  simple method to establish Markov equivalence between MAGs.

\subsection{MECs of MAGs} 
For Markov equivalence between MAGs we start from the following characterization from \cite{AliRS2009}:
\begin{lem} \label{lemMarkovEquivAli}
Two MAGs $\G_1$ and $\G_2$ belong to the same Markov equivalence class if and only if they have the same skeleton and the same colliders with order.
\end{lem}
This reflects the well known characterization for DAGs where two members are in the same equivalence class iff they have the same skeleton and \textit{v}-structures, with the latter now generalized to `collider triples with order': 

\begin{dfn} \label{defTripleWithOrder}
Let $\mfT_i (i \geq 0)$ be the set of \emph{triples of order $i$} in a MAG $\G$, defined recursively as:
%\vspace{-4mm}
\begin{itemize}
\item[-] A triple $\seq{a,b,c} \in \mfT_0$ if $a \mem b \mem c$ is in $\G$, with $a$ and $c$ not adjacent. 
\item[-] A triple $\seq{a,b,c} \in \mfT_{i \geq 1}$ if $\seq{a,b,c} \notin \mfT_{j < i}$, and there is a discriminating path $\seq{x,q_1,..,q_p,a,b,c}$ for $b$ in $\G$ (possibly $q_1 = a$), where the $p+1$ colliders\\
$\seq{x,q_1,q_2}, ..., \seq{q_{p-1},q_p,a}, \seq{q_p,a,b} \in \bigcup\limits_{j < i} \mfT_j$.
\end{itemize}
\end{dfn}

Here a path $\pi = \seq{x,q_1,..,q_p,a,b,c}$ in $\G$ is a \textit{discriminating path} for $b$ iff $x$ is not adjacent to $c$, and every vertex between $x$ and $b$ is a collider along $\pi$ and is parent of $y$ in $\G$. For example in Figure \ref{figDiscriminatingMAG}, $\seq{A,B,C,E}$ would be a discriminating path for $C$, and $\seq{A,B,C,D,E}$ %would be a discriminating path 
for $D$.

Note that triples $\seq{a,b,c}$ and $\seq{c,b,a}$ are equivalent, and that triples with order $i \geq 1$ are triangles in $\G$. Also note that the final condition is only needed to uniquely determine the order $i$, but that the characterization itself does not depend on the actual value.
This characterization leads to an algorithm for testing Markov equivalence between two MAGs with polynomial complexity $O(ne^4)$, with $n$ the number of vertices and $e$ the number of edges in the graph.

More recently, \cite{HuE2020} came up with a characterization in terms of a parameterizing set $\cS_3(\G)$ based on so-called \textit{heads} and \textit{tails} of the districts (connected bidirected components) in $\G$, and the `3' indicates only sets of up to 3 nodes are required. In contrast with \citep{AliRS2009} it does \textit{not} rely on the discriminating path, and leads to an even more efficient algorithm for checking equivalence that runs in $O(ne^2)$ for sparse graphs (when $n = O(e)$). Unfortunately, this characterization is difficult to translate into a comprehensive search strategy between equivalence classes.

However, it turns out that we can also circumvent the discriminating path in Definition \ref{defTripleWithOrder} in another way.

\subsection{A new `triples with order' characterization} \label{sub:NewTriplesWithOrder}
On closer inspection of the second part of Definition \ref{defTripleWithOrder} we see that every discriminating path (see Figure \ref{figDiscriminatingMAG}) can be viewed as a collection of collider and noncollider triples with order. More importantly, to know that a path $\seq{x,q_1,..,q_p,z,y}$ is a valid discriminating path for $z$ in $\G$ it suffices to know that $\seq{x,q_1,..,q_{p-1},q_p,y}$ is a valid discriminating path for noncollider $q_p$ along the path, and that $\seq{q_{p-1},q_p, z}$ is a collider, and that $z$ and $y$ are adjacent in $\G$. But that also means we do not actually need the full discriminating path, but we just need to know that $\seq{q_{p-1},q_p,y}$ is a noncollider with order, and that $\seq{q_{p-1},q_p,z}$ is a collider with order. This results in the following alternative characterization: 

\begin{dfn} \label{defNewTripleWithOrder}
Let $\mfC_i$ resp. $\mfD_i$ $(i \geq 0)$ be the set of \textit{collider-} resp. \textit{noncollider triples with order} $i$ in a MAG $\G$, defined recursively as:
%\vspace{-4mm}
\begin{itemize}
\item[-] A triple $\seq{a,b,c} \in \mfC_0$ (resp. $\mfD_0$), if $a \mem b \mem c$ is an unshielded collider (resp. noncollider) in $\G$.
\item[-] A triple $\seq{a,b,c}  \in \mfC_{i}$ (resp. $\mfD_{i}$), with $i \geq 1$, if $\seq{a,b,c} \notin \mfC_{j < i}$ (resp. $\mfD_{j < i}$), and
\begin{enumerate}
\item $a \mem b \mem c$ is a collider (noncollider) in $\G$, 
\item $\exists q: \seq{q,a,b} \in \mfC_{j < i}$, and $\seq{q,a,c} \in \mfD_{k < i}$.
\end{enumerate}
\end{itemize}
\end{dfn}

\begin{figure}
  \centering
  \includegraphics[width=0.5\linewidth,page=3]{fig1_DiscriminatingPath.png}
  \caption{\small MAG with discriminating paths A-B-C-(D)-E.} 
  \label{figDiscriminatingMAG}
\end{figure}

\begin{table} 
\centering
\begin{tabular}{c c c }
\begin{tabular}{|r|r|r|r|} \hline
$k$ & \multicolumn{2}{r}{$\mfC$} & \\ \hline
0 & A & B & C  \\ 
0 & B & C & D  \\ 
0 & B & E & D  \\ 
2 & C & D & E  \\ 
\hline
\end{tabular} & 
\begin{tabular}{|r|r|r|r|} \hline
$k$ & \multicolumn{2}{r}{$\mfD$} & \\ \hline
0 & A & B & E  \\ 
1 & B & C & E  \\ 
%   &  &  &   \\ 
\hline
\end{tabular} \\
\end{tabular} 
\caption{\small Corresponding `triples with order' lists.} \label{tabMEC}
\end{table}

The connection to the original `triple with order' definition follows from the next lemma (proof in the supplement):

\begin{lem} \label{lemNewOldTripleEquiv}
In a MAG $\G$, a triple $\seq{a,b,c}$ is in $\mfC_{i}$ (resp. $\mfD_i$), if and only if $\seq{a,b,c} \in \mfT_{i}$ and $\seq{a,b,c}$ is a collider (resp. noncollider) in $\G$.
\end{lem}


This motivates the following definition:
\begin{dfn} The MEC $\M$ of a MAG $\G$, denoted $\M(\G)$, is defined as the triplet $\seq{\cS, \mfC, \mfD}$, with $\cS$ the (undirected) skeleton of $\G$, and $\mfC$ and $\mfD$ the corresponding lists of collider resp.\ noncollider triples with order from Definition \ref{defNewTripleWithOrder}.
\end{dfn}
Which leads to the straightforward implication:
\begin{cor} \label{corMECequiv}
Two MAGs $\G_1$ and $\G_2$ are Markov equivalent if and only if $\M(\G_1) = \M(\G_2)$.
\end{cor}
From here on we will use the term MEC to denote this particular representation of the Markov equivalence class of a MAG $\G$.

\subsection{From MAG to MEC} \label{sub:MAGtoMEC}

Definition \ref{defNewTripleWithOrder} implies that after we established the unshielded (non)collider triples with order 0, we only need to check the already constructed lists and a specific (non)collider triple in the graph $\G$ in order to identify each higher order triple. This leads to the following \textbf{MAG-to-MEC} procedure:

%\begin{algorithm}[tb]
\begin{algorithm}[h]
  \caption{MAG-to-MEC}   \label{alg:MAG-to-MEC}
\begin{algorithmic}		% or algpseudocode?
   \STATE{\bfseries Input:} MAG $\G$
   \STATE{\bfseries Output:} MEC $\{\cS,\mfC,\mfD\}$
   \STATE \textit{phase 1: initialise, process unshielded triples}
   \STATE $\cS \leftarrow Skeleton(\G)$
   \STATE $\mfC_0 / \mfD_0 \leftarrow$ unshielded (non)colliders $\seq{x,z,y} \in \G$
   \FORALL{$\seq{x,z,y} \in \mfD_0$}
   \IF{$\exists q: \seq{x,z,q} \in  \mfC_0$ and $\G(q,y) > 0$}  
   \STATE $\mfL \leftarrow \seq{z,q,y}$ \COMMENT{\textit{initialise process list $\mfL$}}
   \ENDIF
   \ENDFOR
   %\COMMENT{process candidate triples until no more left}
   \STATE \textit{phase 2: process candidate triples until no more left}
   \REPEAT
   \STATE $\seq{x,z,y} \leftarrow Pop(\mfL)$
   \IF{$x \mea z \aem y$ in $\G$}
   \STATE add $\seq{x,z,y}$ to $\mfC$
   \STATE $\forall q: \seq{x,z,q} \in \mfD$, $\G(q,y) >0$: add $\seq{z,y,q}$ to $\mfL$
   \ELSE
   \STATE add $\seq{x,z,y}$ to $\mfD$
   \STATE $\forall q: \seq{x,z,q} \in \mfC$, $\G(q,y) >0$: add $\seq{z,q,y}$ to $\mfL$
   \ENDIF
   \UNTIL{$\mfL$ is empty} 
   \RETURN   $\cS, \mfC, \mfD$
\end{algorithmic}
\end{algorithm}

Algorithm \ref{alg:MAG-to-MEC} gives a high-level overview of the corresponding steps. (implementation details available at \url{https://github.com/tomc-ghub/gps_uai2022})
It starts by identifying all unshielded triples (order 0) and allocating them to the appropriate collider or noncollider lists. After that, all triples with order 1 are collected in list $\mfL$, and processed one by one depending on whether they correspond to a collider or noncollider in the graph. Each allocated triple may give rise to new triples with order that are added to the end of the list $\mfL$, until we have found them all. For each processed triple (allocated to $\mfC$ or $\mfD$) we only need to consider the existence of matching triples in the complementary list together with the presence of a specific edge in the MAG to find the new implied (higher order) triples. Table \ref{tabMEC} shows the output $\mfC$ and $\mfD$ lists given the MAG in Figure \ref{figDiscriminatingMAG}.

\subsection{From MEC back to MAG}
For the reverse \textbf{MEC-to-MAG} direction we can directly map all triples with order into specific (minimal) edge mark orientations to obtain the so-called \textbf{core PAG} (Definition \ref{defCorePAG}), and then propagate the remaining implied orientations using a subset of the standard FCI orientation rules from \cite{Zhang2008} to obtain the completed PAG. 
From there we can obtain a matching MAG instance by following, e.g.\ the arc-augmentation procedure in Theorem 2 of \citep{Zhang2008} which will result in a fully oriented MAG in the same MEC with a minimum number of (invariant) bidirected and undirected edges. 

\begin{dfn} \label{defCorePAG} 
\textbf{(core PAG)} For a MEC $\M = \seq{\cS, \mfC, \mfD}$, the core PAG $\cP^*$ is defined as the graph obtained from the skeleton $\cS$ with all $\cec$ edges, in combination with
\begin{itemize}
\item[-] $\forall \seq{x,z,y} \in \mfC_0~~~~:$ orient $x \mea z \aem y$ in $\cP^*$
\item[-] $\forall \seq{x,z,y} \in \mfC_{k \geq 1}:$ orient $z \aem y$ in $\cP^*$
\item[-] $\forall \seq{x,z,y} \in \mfD_{k \geq 1}:$ orient $z \tem y$ in $\cP^*$
\end{itemize}
\end{dfn}
Each collider with order 0 becomes a \textit{v}-structure, and each triple with order $k \geq 1$ corresponds to exactly one invariant edge mark (arrowhead or tail) in the graph. Note that in processing triples $\seq{x,z,y}$ with order $k \geq 1$, we rely on the fact that they are stored in the lists such that the $y$ entry corresponds to the final node in a discriminating path, which is easily done when constructing the MEC. % (see supplement; or actually 'automatically follows from Algorithm 1'). 

The justification for the notion of a`core PAG' is that the resulting graph contains all invariant information needed to uniquely establish the full, completed PAG, %i.e.\ with the maximal amount of invariant edge marks, 
by only propagating the graphical FCI orientation rules, i.e.\ \textit{without} the need for specific independence test results as required by \textit{v}-structure rule $\R0$ and the discriminating path rule $\R4$ in \citep{Zhang2008}.

However, it is possible that an invariant tail is hiding in a higher order noncollider triple that is not a `triple with order', hence we need one more rule to ensure completeness. As it fulfils much the same role as the original $\R4$, we will refer to it as rule $\R4'$:

$\R4'$:~~Let $Z$ be a district among the parents of a node $y$. If $x \mea z \tea y$, with $z \in Z$ and $x$ and $y$ not adjacent, then orient all $u \mea y$ with $u \mea z'$ for some $z' \in Z$ (possibly $z' = z$) as $u \tea y$.

\begin{algorithm}[h]
  \caption{MEC-to-CPAG}   \label{alg:MEC-to-PAG}
\begin{algorithmic}		% or algpseudocode?
   \STATE{\bfseries Input:} MEC $\{\cS,\mfC,\mfD\}$
   \STATE{\bfseries Output:} completed PAG $\cP$
   \STATE $\cP \leftarrow \cP^*(\cS,\mfC,\mfD)$~~~~~~~(\textit{the core PAG from Definition \ref{defCorePAG}})
   \STATE run orientation rules $\R1-\R4'$ on $\cP$~~~(\textit{all arrowheads})
   \STATE run orientation rules $\R5-\R10$ on $\cP$~(\textit{remaining tails})
   \RETURN $\cP$
\end{algorithmic}
\end{algorithm}


The following lemma ensures the output is indeed sound and complete:
\begin{lem} \label{lemMECtoPAGsound}
For a valid MEC $\M$, algorithm \ref{alg:MEC-to-PAG} will output the corresponding completed PAG $\cP$.
\end{lem}


\subsection{Algorithmic complexity} \label{sub:AlgComplexity}
Checking for Markov equivalence between MAGs simply corresponds to building the MEC for one, and verifying that the same steps apply to the other. This will induce a constant cost for each entry in the MEC, and so the algorithmic complexity for increasing graph sizes is determined by the complexity of building the MEC from a given MAG.

To estimate the worst-case time complexity of algorithm \ref{alg:MAG-to-MEC} consider graphs over $n$ nodes with $e$ edges and max. node degree $d$. 
For sparse graphs with $d \leq k$ we have $e = O(n)$, whereas in general we can have $e = O(n^2)$.

The first phase of the algorithm requires finding all unshielded triples, which means selecting all pairs of nodes from the neighbours of every node in the graph, leading to $n \cdot d \cdot (d-1) = O(n d^2)$ triples. 
For the initialization of the temporary triple list $\mfL$ we need to check all triples $\seq{x,y,z}$ in  $\mfC_0$, and compare with specific entries in the complementary list $\mfD_0$ (or vice versa) for nodes adjacent to $z$ in $\G$. With appropriate indexing that implies an additional $d$ candidates to check for each entry in the smaller of the two lists, bringing the total for phase 1 to $O(n d^3)$.

Each entry in the temporary list is then processed and compared against $d$ other candidates, each of which can be handled in constant time as it involves only verifying presence in one of the (non)collider triple lists, which can again be done in constant time using appropriate indexing, and the presence of a specific edge in $\G$, also in constant time.
Each combination added %will become a triple with order $k \geq 1$, and therefore 
corresponds to a triangle in the graph, meaning there are at most $O(n d^2)$ triples to process, where each requires checking $d$ entries, again leading to a combined total of $O(n d^3)$ steps for phase 2. 

Together that means for sparse graphs we have worst case linear complexity of $O(n)$ (!), whereas in general this leads to $O(n^4)$.
This is actually a significant improvement over the $O(n e^2)$ complexity reported by \cite{HuE2020}, corresponding to $O(n^3)$ for sparse graphs and $O(n^5)$ for arbitrary density (when $e = O(n^2)$). 

These complexity results relate to the worst-case scaling behaviour, and in practice the typical performance may scale much better. For example the empirical complexity for sparse graphs in \cite{HuE2020} seemed much closer to our linear result, meaning that in practice the two characterizations may be expected to perform similarly (see section \ref{sub:MAGMECeval}). 
The main contribution of our new representation therefore lies in the way it enables us to traverse the MEC/PAG space in the next section.


\section{Moving between PAGs} \label{secEquivSearch}
The main goal in this article is to find a search strategy that allow us to move directly from one equivalence class to another, as the basis for an iterative (greedy) score-based causal discovery algorithm, similar in spirit to GES for DAGs \citep{Chickering2002a}, but now in the presence of latent confounders.
For that we need a principled way to generate a set of new candidate neighbouring equivalence classes from a given starting equivalence class. 
Key aspects here are deciding \textit{what} to change, and then \textit{how} to change it, in order to ensure the resulting target corresponds to a different but valid equivalence class. 

For the `what', the new characterization in terms of the MEC $\M$ provides a natural starting point, as any change by definition leads to a new equivalence class. This suggests the following basic operators:

\begin{itemize}
\item[-] \textbf{AddEdge} - insert an edge between two %nonadjacent 
nodes in $\cS$,
\item[-] \textbf{DeleteEdge} - remove a single edge from $\cS$,
\item[-] \textbf{MakeNoncollider} - move a triple $\seq{x,z,y}$ in $\mfC$ to $\mfD$,
\item[-] \textbf{MakeCollider} - move a triple $\seq{x,z,y}$ in $\mfD$ to $\mfC$.
\end{itemize}

However, this does not fully answer the `how' yet. A single application on a MEC of one of the operators above can lead to many implied changes, creating as well as destroying other `triples with order'. For example, in Figure \ref{figDiscriminatingMAG}, turning collider triple $\seq{A,B,C}$ into a noncollider would imply the destruction of both higher order triples in Table \ref{tabMEC}, leading to the PAG in Figure \ref{figModifiedPAG}. 
At the same time, not all operators can act in isolation, e.g.\ if two or more triples share an edge in the PAG, and some changes may be invalid, e.g.\ if it would introduce an invariant arrowhead at a node on an undirected edge in the PAG. 
To avoid such inconsistencies and recognise which triples should be modified in conjunction, we implement the operators to act directly on the PAG $\cP$.

\begin{figure}
  \centering
  \includegraphics[width=0.5\linewidth,page=3]{fig3_ModifiedPAG.png}
  \caption{\small PAG after $MakeNoncollider(A,B,C)$ on Figure \ref{figDiscriminatingMAG}.} 
  \label{figModifiedPAG}
\end{figure}

That leaves the problem of how to convert the resulting modified graph into a valid equivalence class, as the standard FCI orientation rules do not suffice given that certain invariant edge marks in the starting PAG may no longer be invariant in the target PAG. 
Fortunately, here too the new MEC characterization comes to the rescue. On closer inspection we see that we could equally well use a PAG as input for the MAG-to-MEC procedure in  Algorithm \ref{alg:MAG-to-MEC}, or indeed the modified graph $\G$ resulting from applying an operator on the PAG $\cP$.

The main challenge that remains is that an operator may introduce a level of ambiguity through newly created triples with order that are \textit{not} fully determined by the modified graph $\G$. For example, given the PAG in Figure \ref{figModifiedPAG}, executing the reverse operator $MakeCollider(A,B,C)$ implies that $\seq{B,C,E}$ is a new triple with order, but we have no information on whether it should be a collider or a noncollider, and indeed both options would lead to a valid PAG. 

In the baseline implementation of our operators, below, we resolve this ambiguity by constructing a specific instance for the $Add/DeleteEdge$ operators that is guaranteed to be valid, and choosing a default `noncollider' option for all remaining undetermined higher order triples.

Having reconstructed the modified MEC $\M'$ we can use Algorithm \ref{alg:MEC-to-PAG} to obtain the corresponding PAG $\cP'$, and expand to an (arc-augmented) MAG instance $\G'$, This MAG can subsequently be used to validate the output. 
The resulting procedure is depicted in Algorithm \ref{alg:PAG-neighbours}.

%Algorithm \ref{alg:PAG-neighbours} shows the steps in more detail:
\begin{algorithm}[h]
  \caption{PAG Candidate Neighbours}   \label{alg:PAG-neighbours}
\begin{algorithmic}		% or algpseudocode?
   \STATE{\bfseries Input:} MEC $\M$, PAG $\cP$, active Operators
   \STATE{\bfseries Output:} collection of PAG $\{Neighbours\}$
   \FORALL{active Operators, target edges/triples in $\M$}
   \STATE $\G \gets Operator(\cP, target)$ ~~~~~\textit{(modified graph)}
   \STATE $\M' \gets MAG\_to\_MEC(\G)$  ~~ \textit{(rebuild MEC, Alg.\ref{alg:MAG-to-MEC})}
   \STATE $\cP' \gets MEC\_to\_PAG(\M')$  ~ \textit{(expand)}
   \STATE $\G' \gets PAG\_to\_MAG(\cP')$ ~~~ \textit{(arc-augmentation)}
   \IF{$IsValidMAG(\G')$}
   \STATE $Neighbours\{end+1\} \gets \{\M',\cP',\G'\}$ ~~\textit{(add)}
   \ENDIF
   \ENDFOR
   \RETURN  $\{Neighbours\}$
\end{algorithmic}
\end{algorithm}


\subsection{Baseline operator implementation} \label{sub:operators}
The potential for inconsistencies and ambiguity from applying operators arbitrarily to a PAG means that a form of validation is necessary to ensure we obtain meaningful candidate neighbour PAGs at each step of the algorithm.

Unfortunately, determining a sound and complete expansion of a PAG with arbitrary background information (the modified graph $\G$ in algorithm \ref{alg:PAG-neighbours}) is still an open problem. In combination with the potential ambiguity from undetermined higher order triples that means we cannot (yet) provide a full a priori `if and only if' validity check for all of the operators. However, we can incorporate some basic checks to ensure we do not try candidates that will lead to obvious inconsistencies. In the experimental results in section 6 we will find that this already filters out the vast majority of invalid candidate PAGs

The validity checks for the the \textit{MakeCollider} and \textit{MakeNoncollider} operators simply verify the edge marks in the modified graph $\G$ would not violate the definition of a valid MAG (no arrowheads at nodes on undirected edges, and no (almost) directed cycles). 
The \textit{MakeNoncollider} operator is the most involved, as there we need to consider multiple versions to create a noncollider triple, possibly leading to multiple, different output PAGs: three versions for order 0 colliders: $x \met z$, $z \tem y$, and $x \met z \tem y$, and one for higher order: $z \tea y$
The \textit{AddEdge} and \textit{DeleteEdge} operators are constructed such that they are always valid, and so do not require an explicit validity check prior to execution. Both are based on a tail-augmented MAG instance \citep{Zhang2006} of the source PAG $\cP$, which needs to be derived only once per iteration.

$AddEdge(\cP,x,y)$: Let $\G$ be a tail-augmented MAG instance of $\cP$. If both $x$ and $y$ have no arrowheads in $\G$, then add $x \tet y$ to $\G$. Otherwise, if $x$ does not have an arrowhead in $\G$, but $y$ does, then add $x \tea y$ (or v.v.). Otherwise, if $x \in An(y)$ in $\G$ then add $x \tea y$, if $y \in An(x)$ then add $x \aet y$, if neither then add $x \aea y$ to $\G$.

$DeleteEdge(\cP,x,y)$: Let $G$ be a tail-augmented MAG instance of $\cP$. Remove edge $x \mem y$ from $\G$.

$MakeCollider(\cP,x,z,y)$: Check for no other $u \tet z$ in $\cP$. 
Let $\G$ be the graph from setting $x \mea z \aem y$ in $\cP$. Check there is no (almost) directed cycle in $\G$ involving $x$, $y$ and $z$.
 
$MakeNoncollider(\cP,x,z,y)$: (Order 0, version 1): If not exists collider triple $u \mea z \aem y$ then skip (=equivalent to version 3). Check not $x \tea z$ in $\cP$ (arrowhead at undirected edge), check $x \aet z$ would not be part of an (almost) directed cycle. Create $\G$ by setting $x \met z$ in $\cP$. 
(Version 2): Idem for $z \tem y$.
(Order 0, version 3): Check if $x \tea z$, then no $u \mea x$ in $\cP$; idem if $y \tea z$ then no $u \mea y$; if either then also check no $u \mea z$. Create $\G$ by setting $x \met z \tem y$ in $\cP$. If $x \aet z$ in $\cP$ then check it is not part of an (almost) directed path in $\G$. Idem for $z \tea y$.
(Higher order): Check $z \tea y$ would not be part of an almost directed cycle in $\cP$. Create $\G$ by setting $z \tea y$ in $\cP$.

In principle the four operators suffice to traverse the entire MEC/PAG space, although that is naturally no guarantee the optimal model will be found in a greedy search strategy.


\section{Greedy PAG Search} \label{secGPS}
Given the procedure to obtain different neighbouring PAGs/MECs, all that remains to turn this into an effective search algorithm is a means to score individual PAGs. For simplicity, we will assume a multivariate Gaussian model.

\subsection{Scoring PAGs} \label{subScoreMEC}
When moving between equivalence classes, algorithm \ref{alg:PAG-neighbours} expands each PAG to an arc-augmented MAG instance to verify validity. Given that for multivariate Gaussian models \cite{RichardsonS2002} already introduced a well-established MAG score, we will rely on that as an associated score for the corresponding equivalence class. Because it is already part of the literature, we will relegate the description of the Gaussian MAG score to Appendix C in the supplement. For details see also \citep{NowzohourMEB2017, TriantafillouT2016}.

Note that the GPS algorithm itself is in no way restricted to multivariate Gaussian distributions. For example, we could equally well have chosen the score for binary/discrete data developed in \citep{Drton2008b}, or alternatively the ADMG score for nested Markov models in \citep{Shpitser2013} as a MAG corresponds to an ADMG for a nested Markov model without implied Verma constraints.

However, as it is also known that the Gaussian MAG score can be notoriously unstable for graphs with larger districts, we will also include an evaluation based on the so-called structural Hamming distance (SHD) relative to the true PAG, to illustrate the potential of the GPS search itself, separate from any potential scoring issues.


\subsection{The baseline GPS algorithm} \label{subGPS}
Having developed all the necessary tools we can now put them together into the (baseline) Greedy PAG Search (GPS) algorithm below.
It starts from an empty model and, using the operators from section \ref{sub:operators}, each time greedily tries to find a different, neighbouring PAG that will improve the score the most, until no more improvements can be found.

\begin{algorithm}[h]
  \caption{Greedy PAG Search}   \label{alg:GPS}
\begin{algorithmic}		% or algpseudocode?
   \STATE{\bfseries Input:} Gaussian covariance $\Sigma$ over $N$ variables
   \STATE{\bfseries Output:} optimal matching PAG $\cP$, top score $s$
   \STATE Initialise: $\M \gets$ empty MEC over $N$ variables, $s \gets 0$ 
   \REPEAT
   \STATE $\{ \bfM \} \gets Candidate\_Neighbours(\M)$
   \FORALL{$\M_i \in \bfM$}
   \STATE $s_i \gets Score(\M_i)$
   \STATE \textbf{if} $s_i > s$ \textbf{then} $(\M,s) \gets (\M_i,s_i)$
   \ENDFOR
   \UNTIL{no more improvement}
   \RETURN  $\cP \gets MEC\_to\_PAG(\M), s$
\end{algorithmic}
\end{algorithm}

The baseline PAG search aims to find a single, unambiguous target for each version of the operators. This limits the number of candidates to consider at each iteration in the search, which helps to speed up the overall process. Downside is that it becomes easier to get stuck in local optima, leading to suboptimal final solutions. Therefore we will also consider an alternative GPS version.

\subsection{Extended GPS search}

Effectively, the baseline operators avoid ambiguity by treating remaining circle marks in the modifed graph as signifying `noncollider'. 
But, by definition, for any circle mark in a PAG there is at least one MAG instance that contains an arrowhead, and so for a newly created unshielded triple involving circles it is possible that the same operator applied to a MAG in the starting $\cP$ would have produced an unshielded collider triple instead. And for multiple such instances, any different combination of collider and noncollider triples with order corresponds to a different PAG. For example in Figure \ref{figModifiedPAG}, removing edge $C \cea E$ would create two new triples with order 0: $\seq{C,B,E}$ and $\seq{C,D,E}$, where both could become either collider or noncollider in one of four different valid PAGs.

That means that the current baseline search effectively only considers a small proportion of the possible set of neighbouring PAGs at each step. Therefore we also introduce a version of the search that generates an extended \textit{collection} of neighbours for each operator, one for each possible (non)collider combination of newly introduced unshielded triples.
In this version, both \textit{AddEdge} and \textit{DeleteEdge} now start from the PAG (rather than a specific tail-augmented MAG instance), where \textit{AddEdge} also considers all possible edge types to add at each application. 

This extended approach is similar in spirit to GES \citep{Chickering2002b}, that at each step also considers a (potentially large) collection of neighbouring equivalence classes per operator, whereas the baseline search is more in line with \citep{Chickering2002a}. 
To avoid the risk of having to consider too many candidates in cases where we encounter dense graphs we simply put a reasonable limit (in our case: 64) on the maximum number of local candidates per operator to consider, again similar to GES. No additional validity checks were implemented per operator, so we may expect the rejection rate to rise compared to the baseline version.

The added rigour of the extended search comes at a noticeable penalty cost in terms of time per iteration. Therefore, as a way of illustrating the flexibility of the approach, we will also consider a \textit{hybrid} version that uses the baseline search as standard, and only switches to the extended version once it gets stuck.

One could envisage similar adaptations that restrict what operators can be used in different search stages, e.g.\ first only allowing `AddEdge', and then a second stage that only uses `DeleteEdge', to mimic the GES strategy. Alternatively, we could start from the output graph found by another method like FCI, and then try to tweak this for further improvements,  or restrict the search to stay within a likely skeleton, etc.

Finally, for this article we will only consider single runs for each GPS instance, but other familiar strategies to improve the final output, like tabu-search, multiple restarts, simulated annealing etc.\, could also be employed.
Establishing what ultimately works best in what circumstances will be left as future work. 


\section{Experimental Evaluation}

\subsection{MAG-to-MEC Complexity} \label{sub:MAGMECeval}

\begin{figure}[H] 
	\includegraphics[width=\linewidth]{complexity_plot.png}
	\caption{Empirical complexity \textbf{MAG-to-MEC}.
	} \label{fig:EmpiricalComplexity}
\end{figure}

A crucial part of the proposed methodology is the new MEC characterization in terms of `triples with order'. In Section~\ref{sub:NewTriplesWithOrder}, we derived that for sparse graphs the theoretical complexity of the \textbf{MAG-to-MEC} algorithm is $O(n)$.
Figure~\ref{fig:EmpiricalComplexity} confirms this via the empirical complexity on random MAGs of size $n = \{10, 20, ..., 100\}$, each averaged over 250 graphs. Similar to the simulation in \cite{HuE2020} the MAGs are generated to have approximately $e = 3n$ edges, (corresponding to $d = 6$), while each edge is (independently) either directed or bidirected with probability $p = 0.5$.\footnote{Simulation details are available with the software at \url{https://github.com/tomc-ghub/gps_uai2022}.} 
The results demonstrate a strong linear trend (even slightly better), both in terms of `elementary operations' (purple) and raw computational time (cyan).


\subsection{GPS Simulation Experiments} \label{sub:SimExp}

We evaluate the speed and accuracy of the three versions of the \textbf{GPS} algorithm. We compare our method against the GSMAG algorithm proposed by~\cite{TriantafillouT2016} and the GFCI algorithm proposed by~\cite{OgarrioSR2016}, while also showing the results obtained with FCI as a baseline. We also compared against DCD \citep{Bhattacharya2021}, which we found to perform slightly worse than GFCI at significantly longer running times, and so is left out of the final comparison.
We generated 100 MAGs for each graph size $n \in \{5, 10, 15, 20\}$, such that the average node degree was $d = 3$, the maximum node degree was $d_{\max} = 10$, and the probability of an edge being bidirected (as opposed to directed) was $p = 0.2$.

We used the following metrics to evaluate the algorithm performance: 1. the Structural Hamming Distance (\textit{SHD}), counting the number of different edges and/or edge marks between the output PAG and the ground truth PAG; 2. the Bayesian information criterion (\textit{BIC}) score for MAGs as proposed by~\cite{TriantafillouT2016}; and 3. the \textit{accuracy} of edge marks, obtained as a Jaccard similarity coefficient, by dividing the number of correct edge marks in the output PAG by the total number of edge marks in the (skeleton) union of output and ground truth PAG.

\begin{table*}[ht]
	\fontsize{7}{9}\selectfont
	\setlength{\tabcolsep}{3pt}
	\caption{\label{tab:accuracy}Algorithm accuracy comparison}
	\centering
	\begin{tabular}[t]{rlrrrrrrrrrrrrrrrrrr}
		\toprule
		\multicolumn{2}{c}{Algorithm} & \multicolumn{4}{c}{GPS baseline} & \multicolumn{4}{c}{GPS extended} & \multicolumn{4}{c}{GPS hybrid} & \multicolumn{4}{c}{GSMAG} & \multicolumn{1}{c}{GFCI} & \multicolumn{1}{c}{FCI} \\
		\cmidrule(l{3pt}r{3pt}){1-2} \cmidrule(l{3pt}r{3pt}){3-6} \cmidrule(l{3pt}r{3pt}){7-10} \cmidrule(l{3pt}r{3pt}){11-14} \cmidrule(l{3pt}r{3pt}){15-18} \cmidrule(l{3pt}r{3pt}){19-19} \cmidrule(l{3pt}r{3pt}){20-20}
		\multicolumn{2}{c}{Criterion} & \multicolumn{2}{c}{BIC} & \multicolumn{2}{c}{SHD} & \multicolumn{2}{c}{BIC} & \multicolumn{2}{c}{SHD} & \multicolumn{2}{c}{BIC} & \multicolumn{2}{c}{SHD} & \multicolumn{2}{c}{BIC} & \multicolumn{2}{c}{SHD} & \multicolumn{1}{c}{N/A} & \multicolumn{1}{c}{N/A} \\
		\cmidrule(l{3pt}r{3pt}){1-2} \cmidrule(l{3pt}r{3pt}){3-4} \cmidrule(l{3pt}r{3pt}){5-6} \cmidrule(l{3pt}r{3pt}){7-8} \cmidrule(l{3pt}r{3pt}){9-10} \cmidrule(l{3pt}r{3pt}){11-12} \cmidrule(l{3pt}r{3pt}){13-14} \cmidrule(l{3pt}r{3pt}){15-16} \cmidrule(l{3pt}r{3pt}){17-18} \cmidrule(l{3pt}r{3pt}){19-19} \cmidrule(l{3pt}r{3pt}){20-20}
		n & metric & empty & FCI & empty & FCI & empty & FCI & empty & FCI & empty & FCI & empty & FCI & empty & FCI & empty & FCI & N/A & N/A\\
		\midrule
		& SHD & 9.73 & 9.56 & 1.76 & 1.11 & 12.53 & 11.14 & 0.31 & 0.47 & 10.49 & 10.56 & 0.92 & 0.80 & 9.73 & 8.60 & 1.34 & 1.61 & 10.36 & 10.64\\
		
		& BIC & 12.88 & 12.81 & 13.09 & 12.89 & 12.46 & 12.53 & 12.89 & 12.89 & 12.63 & 12.61 & 12.94 & 12.90 & 12.41 & 12.55 & 12.95 & 12.90 & 12.99 & 13.05\\
		
		\multirow{-3}{*}{\raggedleft\arraybackslash 5} & accuracy & 0.50 & 0.50 & 0.88 & 0.93 & 0.36 & 0.41 & 0.98 & 0.97 & 0.45 & 0.44 & 0.93 & 0.94 & 0.52 & 0.55 & 0.92 & 0.90 & 0.45 & 0.42\\
		\cmidrule{1-20}
		& SHD & 24.48 & 23.36 & 5.82 & 5.15 & 35.80 & 31.13 & 0.54 & 3.75 & 31.26 & 28.99 & 2.94 & 4.51 & 38.45 & 31.02 & 3.47 & 2.63 & 21.51 & 22.77\\
		
		& BIC & 30.33 & 30.57 & 32.35 & 31.75 & 28.93 & 29.17 & 31.29 & 31.53 & 28.82 & 29.06 & 31.98 & 31.74 & 28.92 & 28.75 & 31.44 & 31.37 & 31.72 & 31.73\\
		
		\multirow{-3}{*}{\raggedleft\arraybackslash 10} & accuracy & 0.49 & 0.50 & 0.83 & 0.84 & 0.33 & 0.38 & 0.98 & 0.88 & 0.40 & 0.41 & 0.91 & 0.86 & 0.32 & 0.42 & 0.90 & 0.92 & 0.48 & 0.45\\
		\cmidrule{1-20}
		& SHD & 34.15 & 38.47 & 7.26 & 8.42 & 54.90 & 53.32 & 1.53 & 6.75 & 50.19 & 50.21 & 4.59 & 7.67 & 62.90 & 53.64 & 3.60 & 3.58 & 29.99 & 34.10\\
		
		& BIC & 36.54 & 36.48 & 40.29 & 39.63 & 33.24 & 33.51 & 38.59 & 39.25 & 32.58 & 33.42 & 39.91 & 39.61 & 32.83 & 32.31 & 38.19 & 38.18 & 38.74 & 39.18\\
		
		\multirow{-3}{*}{\raggedleft\arraybackslash 15} & accuracy & 0.52 & 0.47 & 0.85 & 0.83 & 0.32 & 0.34 & 0.97 & 0.86 & 0.37 & 0.38 & 0.90 & 0.84 & 0.31 & 0.38 & 0.92 & 0.92 & 0.50 & 0.43\\
		\cmidrule{1-20}
		& SHD & 44.35 & 49.69 & 9.96 & 11.20 & 82.98 & 74.49 & 1.69 & 8.67 & 69.11 & 70.45 & 6.63 & 10.73 & 94.06 & 74.46 & 4.47 & 3.34 & 36.82 & 42.72\\
		
		& BIC & 59.91 & 60.14 & 64.77 & 63.94 & 55.44 & 55.08 & 62.87 & 63.42 & 54.77 & 55.96 & 64.30 & 63.94 & 54.52 & 54.23 & 62.61 & 62.55 & 63.53 & 63.90\\
		
		\multirow{-3}{*}{\raggedleft\arraybackslash 20} & accuracy & 0.55 & 0.50 & 0.84 & 0.83 & 0.30 & 0.36 & 0.97 & 0.87 & 0.38 & 0.38 & 0.89 & 0.83 & 0.29 & 0.39 & 0.93 & 0.95 & 0.55 & 0.46\\
		\bottomrule
	\end{tabular}
\end{table*}


\begin{table*}[ht]
	\fontsize{7}{9}\selectfont
	\setlength{\tabcolsep}{3pt}	
	\caption{\label{tab:speed_full}Algorithm speed comparison}
	\centering
	\begin{tabular}[t]{rlrrrrrrrrrrrrrrrr}
		\toprule
		\multicolumn{2}{c}{Algorithm} & \multicolumn{4}{c}{GPS baseline} & \multicolumn{4}{c}{GPS extended} & \multicolumn{4}{c}{GPS hybrid} & \multicolumn{4}{c}{GSMAG} \\
		\cmidrule(l{3pt}r{3pt}){1-2} \cmidrule(l{3pt}r{3pt}){3-6} \cmidrule(l{3pt}r{3pt}){7-10} \cmidrule(l{3pt}r{3pt}){11-14} \cmidrule(l{3pt}r{3pt}){15-18}
		\multicolumn{2}{c}{Criterion} & \multicolumn{2}{c}{BIC} & \multicolumn{2}{c}{SHD} & \multicolumn{2}{c}{BIC} & \multicolumn{2}{c}{SHD} & \multicolumn{2}{c}{BIC} & \multicolumn{2}{c}{SHD} & \multicolumn{2}{c}{BIC} & \multicolumn{2}{c}{SHD} \\
		\cmidrule(l{3pt}r{3pt}){1-2} \cmidrule(l{3pt}r{3pt}){3-4} \cmidrule(l{3pt}r{3pt}){5-6} \cmidrule(l{3pt}r{3pt}){7-8} \cmidrule(l{3pt}r{3pt}){9-10} \cmidrule(l{3pt}r{3pt}){11-12} \cmidrule(l{3pt}r{3pt}){13-14} \cmidrule(l{3pt}r{3pt}){15-16} \cmidrule(l{3pt}r{3pt}){17-18}
		n & metric & empty & FCI & empty & FCI & empty & FCI & empty & FCI & empty & FCI & empty & FCI & empty & FCI & empty & FCI\\
		\midrule
		& iterations & 7.80 & 2.08 & 9.00 & 3.81 & 6.77 & 2.20 & 8.05 & 3.47 & 9.33 & 3.49 & 10.18 & 4.88 & 8.28 & 3.00 & 8.09 & 3.53\\
		
		\multirow{-2}{*}{\raggedleft\arraybackslash 5} & time (s) & 0.33 & 0.18 & 0.23 & 0.15 & 0.97 & 0.49 & 1.12 & 0.67 & 0.63 & 0.45 & 0.52 & 0.38 & 1.60 & 1.24 & 0.53 & 0.48\\
		\cmidrule{1-18}
		& iterations & 19.31 & 7.34 & 20.41 & 7.56 & 17.73 & 7.45 & 16.29 & 6.80 & 23.37 & 10.99 & 22.20 & 8.76 & 21.14 & 10.72 & 16.94 & 8.34\\
		
		\multirow{-2}{*}{\raggedleft\arraybackslash 10} & time (s) & 8.25 & 9.23 & 4.13 & 2.36 & 29.53 & 24.75 & 21.20 & 12.22 & 17.86 & 22.09 & 9.95 & 5.62 & 50.66 & 55.06 & 10.04 & 14.30\\
		\cmidrule{1-18}
		& iterations & 27.53 & 13.18 & 29.68 & 11.06 & 26.86 & 13.57 & 23.12 & 9.20 & 34.69 & 19.74 & 31.49 & 12.23 & 33.04 & 17.98 & 25.07 & 12.69\\
		
		\multirow{-2}{*}{\raggedleft\arraybackslash 15} & time (s) & 33.62 & 55.08 & 18.25 & 9.88 & 156.62 & 187.08 & 102.49 & 51.07 & 98.08 & 149.59 & 41.05 & 20.24 & 321.62 & 360.39 & 70.83 & 59.20\\
		\cmidrule{1-18}
		& iterations & 38.65 & 18.47 & 39.79 & 13.40 & 40.10 & 21.36 & 30.82 & 11.27 & 48.85 & 28.49 & 41.74 & 14.53 & 48.92 & 26.60 & 33.14 & 15.96\\
		
		\multirow{-2}{*}{\raggedleft\arraybackslash 20} & time (s) & 107.39 & 167.10 & 56.80 & 28.00 & 564.51 & 729.35 & 365.16 & 173.52 & 342.30 & 489.25 & 113.12 & 51.15 & 926.26 & 863.41 & 240.71 & 148.44\\
		\bottomrule
	\end{tabular}
\end{table*}


The accuracy results are summarized in Table~\ref{tab:accuracy}. For all GPS versions and GSMAG, we considered two different starting points for the greedy search, namely the empty graph and the PAG obtained by running the FCI algorithm. We used the BIC score for MAGs~\citep{TriantafillouT2016} as the objective function in the greedy optimization. We ran FCI and GFCI using the Tetrad library \citep{GlymourSS2014} %(\url{https://books.google.nl/books?hl=en&lr=&id=iA_jBQAAQBAJ}) 
with default parameters, where Fisher's $z$-test was used for finding conditional independences, and the BIC score was used for the score-based component of GFCI. 

In Table \ref{tab:accuracy} we first note that in terms of accuracy when using the BIC score, baseline GPS and GFCI are the clear winners. Extended/hybrid GPS and GSMAG all manage to obtain better (lower) BIC scores, however at abysmal accuracy ratings, indicating a fundamental issue with the Gaussian MAG score. On closer inspection this turns out to result from unstable BIC scores, primarily related to larger districts, where the RICF fitting step fails to converge properly.
Baseline GPS tends to favour graphs with fewer/smaller districts (due to the `default noncollider' option) for which this issue is much less pronounced. However, using the SHD scores shows a dramatically different result: here GSMAG clearly outcompetes the baseline GPS search with accuracy \% in the low 90s rather than mid 80s for the latter. However this also demonstrates the potential and effectiveness of the extended search obtaining accuracies of 97-98\%. A second interesting observation here is that starting from the FCI PAG actually hinders the extended GPS search from achieving its optimal score by a significant margin (around 10\% worse), suggesting that FCI tends to favour a local optimum from which it can be difficult to escape to the optimal graph. This is also reflected in the hybrid version, that runs extended on top of the baseline output, but also is pushed in a certain region of PAG space from which it is harder to escape in single run greedy search.

When it comes to speed, shown in Table ~\ref{tab:speed_full}, baseline GPS arrives at results much faster than GSMAG in all cases, as it needs to consider far fewer candidates per step. Starting from the FCI PAG cuts the number of iterations roughly in half, although time required using the BIC score can actually increase, again signalling the convergence issue.
The extended GPS-SHD version shows that  the number of iterations required to obtain the optimal model is about a quarter lower than for the baseline version, indicative of the added flexibility the extended neighbour collection can bring. The unsurprising drawback of this larger collection is that the actual running time can be 5-6x greater. Hybrid GPS performance is somewhere halfway between the two. 


To give an indication of the effectiveness of the validity checks: on a typical batch of 16 graphs over various sizes and densities we found that baseline GPS rejected 2879 candidates, and accepted 11517 as neighbouring PAGs, out of which 130 ($1.1\%$) were found to be invalid at the final MAG validation check.
For the same batch, extended GPS version rejected 5969 candidates at the initial check, while accepting 47523, out of which 2087 ($4.4\%$) were found to be invalid after all. 

That means the basic validity tests already filter out close to $95\%$ of all invalid operators. Undoubtedly this can be increased further, but there is a risk the added overhead of significantly more complicated validity tests may not outweigh the benefits of avoiding an extra MEC-PAG-MAG conversion for $1\%$ of the candidates. Similarly, the extended version now captures about $75\%$ of all invalid operators, but this can likely be brought to around the level of the baseline version by adding explicit basic validity checks for each candidate considered by an operator (rather than the single check per operator it is now). 


\section{Conclusion}
We presented GPS, the first score-based equivalence search algorithm in the presence of latent confounders. It was based on a new MEC characterization for MAGs that brings establishing Markov equivalence between sparse graphs down to linear complexity, with the new core PAG providing the crucial link to efficient PAG reconstruction. 
Experimental results confirmed our hopes/expectations that equivalence search could traverse the MAG space faster than single-edge MAG modifications, while arriving at better models, comparable to or improving on other state-of-the-art methods, and that additional gains can be expected by incorporating more comprehensive search strategies like tabu-search and multiple restarts. 
Looking forward, we aim to expand GPS further by considering the full PAG neighbourhood as candidates (similar to GES), and including a more robust equivalence score that can also handle selection bias. 

\begin{acknowledgements} % will be removed in pdf for initial submission,
                         % so you can already fill it to test with the
                         % ‘accepted’ class option
    We thank anonymous reviewers for valuable feedback and helpful suggestions.

\end{acknowledgements}

\bibliography{uai_GPS_2022}

\end{document}
