\section{Background}
In this section, we give the most relevant definitions. We use boldface letters to denote a set of random variables. For more details on other graph notations and terminology, please refer to Appendix~\ref{app:graphnotations}. We also discuss related work in Appendix~\ref{app:relatedwork}. 

\textbf{Causal Graphs.}
    A \textit{causal graph} is used to encapsulate the causal relationships among variables in the form of a DAG, where each node represents a variable $X$ and the directed edge $X \rightarrow Y$ indicates that $X$ causes $Y$. A variable is said to cause another variable if a change in the former induces a change in the probability distribution of the latter.

\textbf{Structural Causal Models (SCMs) and Causal Bayesian Networks (CBNs).} SCMs are used to model causality among a set of random variables. Each variable $X$ is a function of some endogenous variables as its parents, denoted by $Pa({X})$, and an exogenous noise term, denoted as $E_{X}$ e.g. $X = f_{X}(Pa(X), E_{X})$. An SCM induces a causal graph by assigning a set of endogenous variables as the parents of $X$ for all variables $X$. CBNs are used to define a causal model that specifies the observational and interventional distributions via the truncated factorization formula without the functional descriptions like SCMs in a causal graph. 

\textbf{D-separation, Markov Equivalence, CPDAG}. In a causal graph $D$, a path $p$ between $X$ and $Y$ is \textit{d-connecting (active)} relative to a set of vertices $\Z  (X, Y \not \in \Z)$ if $(i)$ every non-collider on $p$ is not in $\Z$ and $(ii)$ every collider on $p$ is an ancestor of some $Z \in \Z$. Otherwise, we say $\Z$ \textit{blocks} $p$. If $\Z$ blocks all paths between $X$ and $Y$, we say $X$ and $Y$ are \textit{d-separated} relative to $\Z$, denoted by $(X \indep Y|\Z)_{D}$. Two DAGs are \textit{Markov equivalent} if they share the same set of d-separation statements. The set of DAGs that are Markov equivalent is called a Markov equivalence class of DAGs, denoted by $[D]$. Generally, a DAG is only identifiable up to its Markov equivalence class since different DAGs can generate the same observational distribution. This leads to an important concept about a partial causal structure.  A \textit{completed partially directed acyclic graph} (CPDAG) that represents $[D]$ and has the same skeleton as $D$, with directed edges $X_i \rightarrow X_j$ if the edge direction between $X_i$ and $X_j$ holds for all DAGs in $[D]$, and undirected edges otherwise. 


\textbf{Possible Parents relative to Equivalence Class} $[D]$. $X$ is called a \textit{possible parent} of $Y$, denoted as $PossPa_{D}(X)$, if any of the following edges is in $D$: $\{X-Y, X \crightarrow Y, X \rightarrow Y, X\circlecircle Y\}$. The notations $X \crightarrow Y$ and $X\circlecircle Y$ are only applicable for a particular partial structure known as $k$-essential graphs which we will discuss in greater details in Appendix \ref{app:extension_discussion}. 

% Other partial causal structures have been proposed under different types of Markov equivalence such as $k$-CPDAG \cite{wienobst2020recovering}, $k$-essential graphs \cite{kocaoglu2023characterization}, and $\mathcal{C}$-essential graphs \cite{lee2024constraint} for generalizing CPDAGs  with a more flexible choice of the d-separation constraints by restricting the conditioning set size. Later, we will show that our proposed algorithm can accept any of these representations as input. As such, we define a single graphical representation to represent a wide spectrum of these equivalence classes as the union of a set of DAGs for the sake of readability. 

% Let us call a graph $G$, which represents some equivalence class of DAGs $\{D\}_{D\in \mathcal{S}}$,  non-ancestrally informative if the following holds: If $X\rightarrow Y\in G$ then $Y\not\in An(X)$ for any $D\in S$.

% \textbf{Abstract Equivalence.} An abstract equivalence class of a DAG $D$ is denoted as $\mathcal{A}(D)$, where its edges are defined as follows:
% (i) $X\mbox{ --- }Y := X\rightarrow Y \cup X\leftarrow Y$, (ii) $X\, o\mbox{---}o \, Y := X\rightarrow Y \cup X\,\leftarrow Y \cup X\leftrightarrow Y$, (iii) $X\,\crightarrow \, Y := X\rightarrow Y \cup X\leftrightarrow Y$. Two DAGs are abstractly equivalent if they belong to the same equivalence class that can be represented by one of the following partial causal structures: CPDAG, $k$-CPDAG, $k$-essential graphs, $\mathcal{C}$-essential graphs.  We use $*$ to denote a wildcard mark of any of the following marks: a tail, an arrowhead, and a circle.}


\textbf{Intervention and F-NODE.} An intervention on a variable is the process of changing the generative mechanism of that variable. Randomized controlled trials (RCTs) and A/B tests are the most common notion of interventions. Pearl uses do-operator $do(X=x)$ to capture this type of intervention. For instance, when $do(X=x)$ forces a variable $X$ to take on certain values, it is known as the \textit{hard} interventions \citep{pearl2009causal}. Its effect in a causal graph is to remove the edges incoming to the intervened nodes. It is different than another type of intervention known as the \textit{soft interventions}, which do not completely alter the causal mechanisms and retain the original causal graph by only replacing $f_{X}(Pa(X), E_{X})$ with $f'_{X}(Pa(X), E_{X})$ where $f'\ne f$. A variable F-NODE has been extensively used to represent the effect of an intervention on a system \citep{pearl1995causal, yang2018characterizing, mooij2020joint}. Throughout this work, we denote a ground truth DAG $D$ being augmented by F-NODE as an intervention to the root cause as $D_{aug}$. We will discuss its role in RCA in the next section. We assume no latent confounders. We also make the extended faithfulness assumption as in \cite{jaber2020causal}. It means that any statistical independence implies d-separation. Please refer to Appendix \ref{assumption:causal Markov assumption} and \ref{assumption:faithfulness} for more details.


% \begin{assumption}[Causal Markov condition]\label{assumption:causal Markov assumption}
% A distribution $P$ is called \textit{Markov} relative to a graph $D=(\V,\mathbf{E})$ if every variable is independent of its non-descendants conditioned on its parents in $D$.
% \end{assumption}

% \begin{assumption}[faithfulness]\label{assumption:faithfulness}
%     For a DAG $D=(\V,\Eb)$ with distribution $P$, any $X, Y \in \V$, are d-separated by a set $\Z \subset \V \setminus \{X, Y\}$ in $D$ if $X$ and $Y$ are conditionally independent given $\Z$ in $P$.
% \end{assumption}

% \textbf{Mutual Information.} Mutual information is a measure of dependence between random variables based on Shannon entropy. It quantifies the reduction in uncertainty about $X$ when $Y$ is known.
% \begin{equation}
%     I(X;Y) = H(X) - H(X|Y)
% \end{equation} 
% For three discrete random variables $X,Y,Z$, the conditional mutual information (CMI) is defined as 
% \begin{equation}
%     I(X;Y|Z) = H(X|Z) + H(Y|Z) - H(X,Y|Z)
% \end{equation} One important property of conditional mutual information is non-negativity.
% \begin{corollary}[\citealt{cover1999elements}]\label{cor:cmi_corrollary}
%     $I(X;Y|Z) \ge 0$ with equality if and only if $X$ and $Y$ are conditionally independent given $Z$
% \end{corollary}