\section{RCA with a Partial Graph}\label{sec:unknown-graph}

% Have a set of figures to show what unknown graph means

\begin{figure}[t]
    \centering
\includegraphics[width=\columnwidth]{figures/k-RCD.pdf}
    \caption{\name framework: The true graphs, $D$ and $D_{aug}$, are unknown to the algorithm. Red nodes represent the root cause, while orange nodes are impacted but not the root cause. During the normal period, \rev{\name learns a CPDAG from a sound causal discovery algorithm.} After a failure, it identifies the root cause by performing marginal invariance tests to further orient the edges and computing the Conditional Mutual Information (CMI), denoted by $I$, between the $\fnode$ and each node in the graph. Finally, \name ranks the nodes by CMI scores, outputting an ordered list of potential root causes.}\label{fig:framework}
\end{figure}

\begin{figure}[t]
    \centering 
   
    % figure a 
     \subfigure[$D_{1}$]
    {     
        \begin{tikzpicture}[scale=0.35]
            \node (2) at (0,0) {$X_{1}$};
            \node (3) at (3,0) {$X_{2}$};
            \node (4) at (6,0) {$X_{3}$};
            \node (5) at (3,3) {$X_{4}$};
            \path (2) edge (3);
            \path (3) edge (4);
            \path (5) edge (3);
        \end{tikzpicture}
        \label{2a-1}
    }
    \subfigure[$D_{1_{aug}}$]
    {     
        \begin{tikzpicture}[scale=0.35]
            \node (2) at (0,0) {$X_{1}$};
            \node (3) at (3,0) {$X_{2}$};
            \node (4) at (6,0) {$X_{3}$};
            \node (5) at (3,3) {$X_{4}$};
            \node (6) at (0,3) {$F$};
            \path (2) edge (3);
            \path (3) edge (4);
            \path (5) edge (3);
            \path[red, line width=1.0] (6) edge (2);
        \end{tikzpicture}
        \label{fig:g-a}
    }
    % figure b 
    \subfigure[$D_{2_{aug}}$] 
    {         
        \begin{tikzpicture}[scale=0.35]
            \node (1) at (0,0) {$X_{1}$};
            \node (2) [right=of 1] {$X_{2}$};
            \node (3) at (2,3) {$X_{3}$};
            \node (4) at (6,3) {$F$};
            \path[red, line width=1.0] (4) edge (3);
            \path (1) edge (2);
            \path (3) edge (2);
            \path (3) edge (1);
        \end{tikzpicture}
        \label{fig:g-b}
    }
    % figure c 
    \subfigure[$\mathcal{C}(D_{2})$] 
    {         
        \begin{tikzpicture}[scale=0.35]
            \node (1) at (0,0) {$X_{1}$};
            \node (2) [right=of 1] {$X_{2}$};
            \node (3) at (2.5,3) {$X_{3}$};
            \path[-] (1) edge (2);
            \path[-] (3) edge (2);
            \path[-] (3) edge (1);
        \end{tikzpicture}
        \label{fig:g-c}
    }
    % figure d 
    \subfigure[$D_{3_{aug}}$]
    {     
        \begin{tikzpicture}[scale=0.35]
            \node (2) at (0,0) {$X_{1}$};
            \node (3) at (3,0) {$X_{2}$};
            \node (4) at (6,0) {$X_{3}$};
            \node (1) at (3,3) {$F$};
            \path (2) edge (3);
            \path (3) edge (4);
            \path[red, line width=1.0] (1) edge (3);
        \end{tikzpicture}
        \label{fig:g-d}
    }
    % figure e 
    \subfigure[$\mathcal{C}(D_{3})$]
    {     
        \begin{tikzpicture}[scale=0.4]
            \node (2) at (0,0) {$X_{1}$};
            \node (3) at (3,0) {$X_{2}$};
            \node (4) at (6,0) {$X_{3}$};
            \path[-] (2) edge (3);
            \path[-] (3) edge (4);
        \end{tikzpicture}
        \label{fig:g-e}
    }
    \caption{(a) A true graph $D_{1}$ which is also the CPDAG of $D_{1}$.
        (b) A true graph augmented from $D_{1}$. It shows how a CPDAG can help identify root causes more efficiently. 
        (c)-(d) A true graph $D_{2_{aug}}$ augmented from $D_{2}$ and the CPDAG $\mathcal{C}(D_{2})$. They show how a CPDAG may not help identify root causes with more CI tests since it does not have any orientations. (e)-(f) A true graph $D_{3_{aug}}$ augmented from $D_{3}$ and the CPDAG $\mathcal{C}(D_{3})$, showing that not all CPDAGs without orientations are equally informative for RCA.
    }
    \label{fig:challenge_example}
\end{figure}

Having established that a causal graph helps to reduce the number and order of CI tests, we now turn our attention to the challenge of performing RCA with partial graphical structure in the case of multiple root causes. We provide the workflow of the proposed solution in Figure~\ref{fig:framework}. All proofs are provided in the Appendix~\ref{app:proofs}. We start with an example to highlight three main challenges of incorporating a partial causal structure with CI tests, as illustrated in Figure~\ref{fig:challenge_example}. For simplicity, we use a CPDAG as the given partial causal structure. However, our results in this section also hold for other partial causal structures, which we leave to Appendix~\ref{app:extension_discussion}. We briefly discuss the challenges of learning CPDAG from observational data on top of RCA and the benefits of using other partial causal structures in the end of Appendix \ref{app:relatedwork}.

\textbf{Motivating Example.} Consider the true augmented graph $D_{1_{aug}}$ shown in Figure \ref{fig:g-a}. The CPDAG of $D_{1}$ is the induced subgraph of $D_{1_{aug}}$ obtained by removing $F$. Here, we can use a single CI test to identify the root cause. We can select $X_{1}$ and test the CI relation $(F \indep X_{1})_{P}$. Since $X_{2}, X_{3}, X_{4}$ are non-ancestors of $X_{1}$ in the CPDAG and $(F \dep X_{1})_{P}$, it follows that $X_{1}$ must be a child of $F$ in the ground truth. Hence, $X_{1}$ is the root cause. However, RCD must have tested $6$ CI tests e.g., $(F \indep X_{4})_{P}, (F \dep X_{2})_{P}, (F \indep X_{2}|X_{1})_{P}$, $(F \dep X_{3})_{P}$ and $(F \indep X_{3} | X_{2})_{P}$ (or $(F \indep X_{3} | X_{1})_{P}$), in order to conclude that $X_1$ is the root cause in the best case. Nonetheless, it is unclear how to initially select a variable for testing conditional independence. The second challenge is that some CPDAGs do not have any orientations as shown by Figure \ref{fig:g-c}. We cannot utilize any ancestral relationships even if we exhaust all marginal tests. The third challenge is that all CPDAGs that do not have any orientations are not equally informative for RCA. Consider another true augmented graph in Figure~\ref{fig:g-d} and the corresponding CPDAG learned from observed data in \ref{fig:g-e}, one can infer that: (i) $F$ cannot point to $X_{1}$ due to $(F\indep X_{1})_{P}$; (ii) $F$ has a directed path to $X_{2}$. Therefore, $X_{1}-X_{2}$ can be further oriented as $X_{1}\rightarrow X_{2}$ in Figure \ref{fig:g-e} with interventional data. Since all the unshielded colliders in Figure \ref{fig:g-e} should have been oriented, $X_{2}-X_{3}$ can then be further oriented as $X_{2} \rightarrow X_{3}$, resulting in $X_{1}\rightarrow X_{2} \rightarrow X_{3}$. Hence, we can conclude that $X_{2}$ is the root cause as $X_{2}$ is the parent of $X_{3}$ without testing whether $(F\indep X_{3})_{P}$ holds.



% One common approach to learning the causal structure is to incorporate expert knowledge~\citep{chakraborty2023causil, gong2024porca, lin2024root, xin2023causalrca}. However, it may not always be feasible to obtain expert knowledge. A data-driven approach to causal structure learning then becomes a more viable solution. However, learning a causal structure can be extremely time-consuming~\citep{chickering2004large}. For constraint-based methods, they often involve conditioning on large sets of nodes to identify possible separating sets for each node~\citep{spirtes2000causation}. This time-consuming aspect of causal discovery is particularly undesirable in our context, where time is critical following a failure, and the goal is to quickly pinpoint the root cause.

% We begin with an observation that the lifetime of an application can be divided into two phases: the normal period and the post-failure period. The normal period refers to the time when the system is functioning normally without any issues, while the post-failure period starts once a failure has been reported. Leveraging this temporal division, we make a crucial observation: \textit{the normal period can be used to proactively prepare for potential failures}. We argue that identifying the root cause of a failure is time-critical \emph{after} the failure occurs, during the post-failure period. However, during the normal period, there is ample time to prepare for such events. Viewing the system through the lens of the normal and post-failure periods allows us to perform computationally expensive operations during the normal period when there is no urgent time constraint. This pre-processing step can be extremely beneficial during the post-failure period, where time is of utmost importance.

% A key point is that learning causal structures does not require interventional data~\citep{spirtes2000causation, chickering2002optimal, shimizu2006linear, zheng2018dags}. We can leverage the vast amounts of data generated during the system's normal operation to construct the causal graph, rather than waiting for a failure. This graph can then be used to efficiently identify the root cause when a failure occurs, enabling a faster, more effective response.
% This means we do not need to wait for a failure to occur before constructing the causal graph. Instead, we can utilize the vast amounts of data generated during the system's normal operation to learn the causal graph. Once we have this causal graph from observational data, it can be utilized to efficiently identify the root cause \emph{after} a failure occurs, allowing for a more rapid and effective response.

% One common approach to learning the causal structure is to incorporate expert knowledge~\cite{chakraborty2023causil, gong2024porca, lin2024root, xin2023causalrca}. In these systems, expert knowledge can be derived either from collected data, which is then used to train a neural network model to generate the graph, or from intrinsic components such as the service call graph, which directly informs the graph's construction. In most cases, the output is a DAG. As long as the constructed graph is a DAG, we can apply Algorithm \ref{alg:rcd-g} to efficiently identify the root cause following a failure. 

% Another approach that solely relies on collected data to construct the graph is to use extensive literature from the causal discovery. Approaches such as PC and FCI, learn a causal graph from the given data, however, the learned graph is not a DAG as the causal graph can only be learned by its Markove Equivlance class given the access to only the observational data. This graph structure is an essential graph of the true causal DAG

% - Now, how to learn the graph?
% - Expert knowledge, call graph -> DAG -> Algorithm 1
% - No graph available -> causal discovery (pc, kpc, cpc) -> take kpc because of low-order CI tests so sample efficient -> Algorithm 2
% - Algorithm 2 is also applicable for any essential graph.

\textbf{Ranking Root Causes.} A key requirement for RCA tools is the output format. While failures typically have few root causes, much of the literature focuses on ranking nodes and reporting the top-$l$. This poses a challenge for approaches that rely on CI tests, which often identify only a single or a few root cause nodes. RCD addresses this by gradually increasing the significance level, $\alpha$, in its CI tests and rerunning the algorithm until at least $l$ nodes are identified. However, this does not guarantee a meaningful ranking; the resulting nodes may appear in an arbitrary order, and multiple reruns increase runtime.

\rev{To address this along with the challenges mentioned previously}, \name (Algorithm~\ref{alg:sample-version}) leverages the critical insight that the ranking in RCA aligns with an information-theoretic approach. Clearly, any non-root-cause variable $\bar{R}$ can be d-separated from $F$ given its parents $Pa_{\bar{R}}$, while only the true root cause $R$ is d-connecting with $F$ given its parents $Pa(R)$. Under the faithfulness assumption, $F$ must be conditionally dependent with $R$ given $Pa(R)$, and by Assumption \ref{assumption:causal Markov assumption}, $F$ must be conditionally independent with $\bar{R}$ given $Pa(\bar{R})$. These conditional independencies can be measured using CMI. Thus, RCA with a partial causal structure can be broken down into two steps: finding the parents of each variable and estimating the CMI given its parents. Ranking the potential root causes is achieved by sorting the CMI values in descending order. This non-parametric method is robust, capturing both linear and nonlinear dependencies, and works across various types of distributions, whether discrete, continuous, or mixture. 

% \begin{proposition}\label{prop:maximizing_cmi}
%     Given any DAG $D$, under Assumptions \ref{assumption:causal Markov assumption} and \ref{assumption:faithfulness}, 
%     \begin{align}
%         & I(F; R |Pa_{D}(R)) > 0\\
%         &  I(F; \bar{R} |Pa_{D}(\bar{R})) = 0
%     \end{align}, where $R$ is the actual root cause and $\bar{R}$ denotes a non-root cause variable.
% \end{proposition}


% This non-parametric approach is desirable as it can capture both linear and nonlinear dependencies. It also accommodates a wide range of joint distributions, whether discrete, continuous, or a mixture of both. 

% Learning the partial causal graph from data requires a series of high-order CI tests~\citep{spirtes2000causation}. However, the statistical power of these tests diminishes significantly as the size of the conditioning set increases~\citep{shah2020hardness, kocaoglu2023characterization}. To address this issue, we propose using a more robust approach through the generalized $\mathcal{C}$-PC algorithm~\citep{lee2024constraint}, which obtains a $\mathcal{C}$-essential graph. This graph represents the Markov equivalence class of DAGs based on a restrictive set $\mathcal{C}$ of conditioning sets. The set $\mathcal{C}$ allows us to specify which conditioning set to use, enabling reliance on CI tests with smaller conditioning sets and avoiding high-dimensional variables. For details about the $\mathcal{C}$-essential graph and its interpretation, see Appendix~\ref{app:graphnotations} and~\ref{app:samplerun-cpc}. We also discuss the challenges of using CI tests exclusively for RCA with a $\mathcal{C}$-essential graph in Appendix~\ref{app:motivating-examples-and-challenges}.

However, given a CPDAG, the parent set of each variable may not always be known. Our key contribution is to show that computing the CMI between $F$ and each variable $X$, conditioned on possible parent set of $X$, is sufficient to identify root causes. This allows us to identify the root cause using only $n$ invariance tests. We have proven the soundness of our algorithm for identifying root causes. This result is further extended to other partial causal structures in the form of a mixed graph such as $k$-CPDAG \cite{wienobst2020recovering} or $k$-essential graphs \cite{kocaoglu2023characterization} for the data-scarce regime, which we discuss in Appendix~\ref{app:extension_discussion}. Our algorithm can accept the output of \textit{any} causal discovery algorithm, once it is converted to a CPDAG~\footnote{A CPDAG captures all edges that can be learned through CI constraints and the remaining edges are uninformative.}. To combat finite sample noise, Algorithm \ref{alg:sample-version} first sorts the mutual information between $F$ and each variable in descending order. Then, it starts by using the minimum mutual information as a threshold to determine statistical independence and orient the CPDAG (lines 5-12). It repeatedly increments the threshold based on the next smallest mutual information until we have a consistent ranking of the root causes (see lines 19-20), meaning that there cannot be any variable that has a low mutual information with $F$ but a high CMI given its possible parents. We do so to ensure the orientation applied to the CPDAG is consistent with the ranking procedure. 

% \begin{figure}[t]
%     \centering
%     \small
%     \begin{minipage}{0.48\textwidth}
%         \begin{algorithm}[H] 
%             \small
%             \caption{RCA with Causal Graphs (\name)} \label{alg:sample-version}
%             \begin{algorithmic}[1]
%                 \INPUT Observational data $\mathcal{D}$, interventional data $\mathcal{D^{\star}}$, a CPDAG $\mathcal{C}(D)=(\mathbf{V}, \mathbf{E})$, Max. no of root causes $k$,
%                 \OUTPUT top $l$ root causes
%                 \STATE $\alpha  \leftarrow 0.001$; $\tau = 0.001$;  Concatenate $\mathcal{D}$ and $\mathcal{D^{\star}}$ with a binary indicator variable $\fnode$; Create an empty list $L$
%                 \WHILE{\textbf{True}}
%                 \STATE $G \leftarrow \mathcal{C}(D)$
%                 \FOR{$X, Y \in \V$}
%                     \IF{$I(F;X) < \alpha$ and $I(F;Y) \ge \alpha$}
%                     \STATE If $X \leftarrow Y$ is in $G$, remove $X \leftarrow Y$ \label{linenum: leftarrow}
%                     \STATE If $X-Y$ is in $G$, orient $X\rightarrow Y$ \label{linenum: undireced}
%                     % \STATE If $X \circlecircle Y$ is in $G$, orient $X \crightarrow Y$
%                     % \STATE If $X \cleftarrow Y$ is in $G$, orient $X \leftrightarrow Y$
%                     \ENDIF
%                 \ENDFOR
%                 % \STATE $G \leftarrow$ \textbf{MARGINAL-INVARIANCE}($D, G$)
%                 \FOR{$X \in \V$}
%                 % \STATE $\mathbf{Z} \leftarrow \min_{\mathbf{Z}} I(F;X|\mathbf{Z}),$ where  $\mathbf{Z} \subseteq PossPa_{G}(X)$ and $|\mathbf{Z}| \le k$ 
%                 \STATE $I_{X} \leftarrow I(F;X|PossPa_{G}(X))$
%                 \ENDFOR
%                 \STATE $\mathbf{V}_s \leftarrow $Sort $X\in \V$ by $I_{X}$ in descending order
%                 \IF{ $\exists X$ that has $I(F;X) < \tau$ and $I_{X}$ is ranked on top $k$}
%                 \STATE $\alpha = \alpha - \tau$ ; Add $\alpha$ to $L$
%                 \ELSE 
%                 \STATE $\alpha = \alpha + \tau$ ; Add $\alpha$ to $L$
%                 \ENDIF
%                 \IF{$\alpha \in L$}
%                  \STATE \textbf{Return} the first $k$ root causes from $\mathbf{V}_s$.
%                 \ENDIF
%                 \ENDWHILE
%             \end{algorithmic}
%         \end{algorithm}
%     \end{minipage}
% \end{figure}




        \begin{algorithm}[t] 
            \small
            \caption{RCA with Causal Graphs (\name)} \label{alg:sample-version}
            \begin{algorithmic}[1]
                \INPUT Observational data $\mathcal{D}$, interventional data $\mathcal{D^{\star}}$, a CPDAG $\mathcal{C}(D)=(\mathbf{V}, \mathbf{E})$, Max. no of root causes $l$,
                \OUTPUT Top-$l$ root causes

                \STATE Concatenate $\mathcal{D}$ and $\mathcal{D^{\star}}$ with a binary indicator variable $\fnode$.
                \FOR{$X \in \V$}
                    \STATE $A_X \leftarrow I(F;X)$
                \ENDFOR
                % \STATE Sort $A$ in ascending order
                \STATE $A \gets $Sort $X\in \V$ by $A_{X}$ in ascending order
                \STATE Create an empty list $\textbf{V}^{\star}_s$
                \FOR{$\alpha \in A$}
                    \STATE $G \leftarrow \mathcal{C}(D)$
                    \FOR{$X, Y \in \V$}
                        \IF{$I(F;X) < \alpha$ and $I(F;Y) \ge \alpha$}
                        \STATE If $X \leftarrow Y$ is in $G$, remove $X \leftarrow Y$ \label{linenum: leftarrow}
                        \STATE If $X-Y$ is in $G$, orient $X\rightarrow Y$ \label{linenum: undireced}
                        \ENDIF
                    \ENDFOR
                    \FOR{$X \in \V$}
                        \STATE $CMI_{X} \leftarrow I(F;X|PossPa_{G}(X))$
                    \ENDFOR
                    \STATE $\mathbf{V}_s \leftarrow $Sort $X\in \V$ by $CMI_{X}$ in descending order
                    \IF{ $\exists X$ that has $I(F;X) < \alpha$ and $CMI_{X}$ is ranked on top-$l$ in $\textbf{V}_s$}
                        \STATE \textbf{Return} the first $l$ root causes from $\textbf{V}^{\star}_s$.
                    \ENDIF
                    \STATE $\textbf{V}^{\star}_s \gets \textbf{V}_s$
                \ENDFOR
                \STATE \textbf{Return} the first $l$ root causes from $\textbf{V}^{\star}_s$.
            \end{algorithmic}
        \end{algorithm}









% flexibly accommodate a wide range of scalable and efficient observational causal discovery algorithms that output a CPDAG such as BOSS \cite{andrews2023fast}, XGES \cite{nazaret2021extremely}. } 

% Our key contribution is that only $n$ marginal invariance tests need to be conducted during failure to obtain a superset of the parent set for each non-root-cause variable $\bar{R}$ that d-separates $\bar{R}$ from $F$, where $n$ is the number of observed variables. We have proved the soundness of our algorithm for identifying root causes.


% Instead of learning an essential graph via PC algorithm which conducts a series of high-order CI tests~\citep{spirtes2000causation}, we can use a more robust and generalized version known as the $\mathcal{C}$-PC algorithm \citep{lee2024constraint} to obtain a $\mathcal{C}$-essential graph, a graphical representation of the Markov equivalence class of DAGs learned based on a restrictive set $\mathcal{C}$ of conditioning sets, during the normal operation time. The set $\mathcal{C}$ allows us to specify which conditioning set to use on top of an empty set for all CI tests, giving us the ability to rely on CI tests with small conditioning set size and without high-dimensional variables. This feature is desirable as the statistical power of CI tests can drastically reduce as the conditioning set size increases or the support of the set is large. Please see Appendix \ref{app:graphnotations} and \ref{app:samplerun-cpc} for details about the $\mathcal{C}$-essential graph and its interpretation. We also provide discussions on the challenges of using CI tests only for RCA given a $\mathcal{C}$-essential graph in Appendix \ref{app:motivating-examples-and-challenges}. Our contribution is to show that one only needs to run $n$ marginal invariance tests during the failure time to obtain a superset of the parents set of each non-root-cause variable $\bar{R}$ that d-separate  $\bar{R}$ from $F$ given a $\mathcal{C}$-essential graph, where $n$ is the number of observed variables. While Lemma \ref{lem:no_causal_path} ensures the correctness of Algorithm \ref{alg:marginal-invariance-test}, Lemma  \ref{lem:conditioning_on_posspa} connects Algorithm \ref{alg:sample-version} with Proposition \ref{prop:maximizing_cmi} via the use of possible parent sets. 

% \begin{restatable}{lemma}{nocausalpath}\label{lem:} 
% \label{lem:no_causal_path}
% Given a distribution $P$ defined over a set of CIs based on a conditionally closed set $\mathcal{C}$, for any $X, Y \in \V$ and $\Z\in \mathcal{C}$, if $(\ci{X}{Y}{\Z})_{P}, (\nci{X}{W}{\Z})_{P}$, then no DAG faithful to $P$ contains the edge $W \rightarrow Y$.
% \end{restatable}

% \begin{restatable}{lemma}{optimal}\label{lem:conditioning_on_posspa} 
% Let $M$ be the graph returned by  Algorithm \ref{alg:marginal-invariance-test},  $F$ is not adjacent to $X$ in $D_{aug}$ if and only if $F$ is d-separated with $X$ given $PossPa_{M}(X)$ in $D_{aug}$.  
% \end{restatable}
\begin{restatable}{theorem}{soundness}\label{thm:soundness} 
\rev{Given a CPDAG output by any sound causal discovery algorithms and under causal sufficiency and the extended faithfulness assumption, Algorithm \ref{alg:sample-version} returns the true root cause variables. } 
\end{restatable}



% \begin{restatable}{corollary}{corollary}\label{cor:any_cpdag_algorithm} 
% \rev{Given a partial causal structure from any sound causal discovery algorithms and under causal sufficiency and the extended faithfulness assumption, Algorithm \ref{alg:sample-version} returns the true root cause variables. } 
% \end{restatable}

% \begin{restatable}{lemma}{optimal}\label{lem:conditioning_on_posspa} 
% Let $M$ be the graph returned by  Algorithm \ref{alg:marginal-invariance-test},  $F$ is not adjacent to $X$ in $D_{aug}$ if and only if $F$ is d-separated with $X$ given $PossPa_{M}(X)$ in $D_{aug}$.  
% \end{restatable}




% \begin{proofS}
% We utilize the fact that $F$ is marginal dependent with an observed variable $X$ if and only if $F$ has a directed path to $X$. We then investigate various backdoor active path structures based on a subgraph of the given a $\mathcal{C}$-essential graph after conducting a series of marginal invariance tests based on Lemma \ref{lem:no_causal_path}.  
% \end{proofS}

% Next, we briefly discuss the trade-off between computational efficiency and sample complexity in Algorithm~\ref{alg:sample-version}. As noted by Corollary~\ref{cor:size_of_poss_pa}, a larger set $\mathcal{C}$ allows the $\mathcal{C}$-PC algorithm to conduct more CI tests, potentially including high-order tests. While this tends to result in a sparser graph, it also increases the time needed to learn the causal graph during normal operations and requires more samples for reliable CI tests. The goal is to reduce the set of possible parents during normal operation by conducting more informative CI tests based on data reliability. Although our method can leverage advancements in consistent CMI estimators for high-dimensional datasets~\citep{mukherjee2020ccmi, NEURIPS2023_48db6744}, a smaller set of possible parents will reduce the time needed to compute CMI during critical failure situations. We provide more discussion on this topic in the Appendix~\ref{app:trade-off-sample-computational-efficiency}. 

% Corollary \ref{cor:size_of_poss_pa} tells us that if one is willing to obtain a more refined $\mathcal{C}$-essential graph during the normal operation time by using more CI tests, then the size of the possible parents set of each observed variable can be reduced to facilitate the computation of conditional mutual information during the failure time. 

% \begin{restatable}
%     {corollary}{sizeofposspa}\label{cor:size_of_poss_pa}
%     Given two graphs $M_{1}, M_{2}$ returned by Algorithm $\ref{alg:marginal-invariance-test}$ based on two different $\mathcal{C}$-essential graphs $\varepsilon_{\mathcal{C}_{1}}(D)$ and $\varepsilon_{\mathcal{C}_{2}}(D)$, if $\mathcal{C}_{1} \subset \mathcal{C}_{2}$, then $|PossPa_{M_{1}}(X)| \ge |PossPa_{M_{2}}(X)|$.
% \end{restatable}


% there is a wealth of literature on methods for learning essential graphs at scale~\citep{kalisch2007estimating, le2016fast, zarebavani2019cupc, hagedorn2021gpu, hagedorn2022gpu}.
% The benefit of doing so is to reduce the sample size and time required to compute CMI during the failure time. Also, our proposed method can leverage the recent advancement of various consistent estimators of CMI with high-dimensional datasets \citep{pmlr-v80-belghazi18a, mukherjee2020ccmi, NEURIPS2023_48db6744}. 


% \begin{corollary}
% \label{cor:size_of_poss_pa}
%     Given two graphs $M_{1}, M_{2}$ returned by Algorithm $\ref{alg:marginal-invariance-test}$ based on two different $\mathcal{C}$-essential graphs $\varepsilon_{\mathcal{C}_{1}}(D)$ and $\varepsilon_{\mathcal{C}_{2}}(D)$, if $\mathcal{C}_{1} \subset \mathcal{C}_{2}$, then $|PossPa_{M_{1}}(X)| \ge |PossPa_{M_{2}}(X)|$
% \end{corollary}

% \begin{lemma}
%     Under the local Markov condition and the faithfulness assumption, given a DAG $D=(V , E )$ and any $\Delta = I(F; R | Pa_{D}(R))- I(F; \bar{R} | Pa_{D}(\bar{R})) > 0, \delta$ of the desired accuracy and confidence parameters, we have
%     \begin{equation}
%         Pr(\widehat{I(F; R | Pa_{D}(R))} > \widehat{I(F; \bar{R} | Pa_{D}(\bar{R}))} ) \ge 1- \delta
%     \end{equation} whenever the number of samples $N$ satisfies
%     \begin{equation}
%         N \ge \frac{2}{\Delta^{2}}\log\left(\frac{4ab}{\delta}\right)
%     \end{equation}
%      where $F$ is the failure modeled as an intervention, $R$ is the root cause, and $\bar{R}$ is a non-root-cause variable, and $a= \max\{|R|, |\bar{R}|\}$, $b= \prod_{i}^{m}|X_{i}|$ for $X_{i} \in Pa_{D}(X)$ and $|X_{i}|$ is the number of states. 
% \end{lemma}
% \begin{proof}
%     Without loss of generality, we let $\mathbf{Z}$ be the largest possible $Pa_{D}(X)$ for any node $X$ in $D$. We will first show that 
%     \begin{equation}
%         P(|I(F; X | \mathbf{Z})- \widehat{I(F;X|\mathbf{Z})}| \le \epsilon) \le 1- \delta
%     \end{equation}
%     To approximate $I(F;X|\mathbf{Z})$, we need to estimate the joint probabilities $Pr(F =f, X=x, Z_{1}=z_1, \ldots , Z_{m}= z_m)$. Thus, the total number of states is $\mathcal{N} = |F| |X| \prod_{i}^{m}|Z_{i}|$. By Hoeffding's inequality, we know that 
%     \begin{align}
%         P(|P(f,x,z_1, \ldots, z_m)- \hat{P}(f,x,z_1, \ldots, z_m) | \ge \epsilon) \\\le 2\exp(-2N\epsilon^{2}) \notag
%     \end{align}
%     Applying the union bound over all probability estimates, we have 
%     \begin{align}
%     &P(\max_{f,x,z_{1},\ldots,z_{m}}|P(f,x,z_1, \ldots, z_m)- \hat{P}(f,x,z_1, \ldots, z_m)| \ge \epsilon) \\
%          &\le |F||X|\prod_{i}^{m}|Z_{i}| 2\exp(-2N\epsilon^{2}) \notag
%     \end{align}. Let $\delta >0$ such that 
%     \begin{equation}
%         |F||X|\prod_{i}^{m}|Z_{i}| 2\exp(-2N\epsilon^{2}) \le \delta
%     \end{equation}. Solving for $N$, we have that
%     \begin{equation}\label{eq:bound}
%         N \ge \frac{1}{2\epsilon^{2}} \log \left(\frac{2|F||X|\prod_{i}^{m}|Z_{i}|}{\delta}\right)
%     \end{equation}
%     We know that 
%     \begin{align}
%         &\widehat{I(F;R|Pa_{D}(R))} = I(F;R|Pa_{D}(R)) + \epsilon_{1} \\
%         &\widehat{I(F;\bar{R}|Pa_{D}(\bar{R}))} = I(F;\bar{R}|Pa_{D}(\bar{R})) + \epsilon_{2} 
%     \end{align}, where $\epsilon_{1}, \epsilon_{2}$ are the deviations of empirical estimates from the true values, bounded by $\epsilon$. To ensure the true inequality holds for the empirical estimates, we need
%     \begin{equation}
%         I(F;R|Pa_{D}(R)) + \epsilon_{1} > I(F; \bar{R}|Pa_{D}(\bar{R}) + \epsilon_{2}
%     \end{equation}
    
%     Since  $\epsilon_{1}, \epsilon_{2}$ are bounded by $\epsilon$. Let $\Delta = I(F; R|Pa_{D}(R)) - I(F; \bar{R}|Pa_{D}(\bar{R}))$. By the local Markov condition and the faithfulness assumption, we have $\Delta > 0$. 
%     With $\Delta>0$, we can choose $\epsilon$ such that $\Delta > 2\epsilon$. Substituting $\epsilon < \frac{\Delta}{2}$ into (\ref{eq:bound}), we have that
% \begin{equation}
%         N \ge \frac{2}{\Delta^{2}} \log \left(\frac{2|F||X|\prod_{i}^{m}|Z_{i}|}{\delta}\right)
%     \end{equation}. The result follows as $|F| = 2$. 
%  \end{proof}

% List of contributions:

% Kenneth:

% - I wrote all sections in the appendix except for the related work and the section that talks about the sample run of RCD.
% - I initiated the idea of using mutual information for RCA. I later came up with a way to combine kPC with CMI to outperform MI. I prove all theories for the correctness of this approach. 
% - I wrote all the codes that have been used for the proposed algorithm RCG from learning the graphs by parallelizing CI tests, computing conditional mutual information in a more efficiency way, and applying zero order invariance tests. These codes were translated by Azam to adapt his existing codebase due to package differences. 
% - I wrote the background, and the first two paragraphs of problem formulation. I wrote the second paragraph in section 4, also added an example to illustrate lemma 4.1, 4.2. I wrote the content of section 4 starting from lemma 4.1 to the end of section 4. In section 5, I wrote the content starting from "Proposition 5.1" to the end of the section. 
% - I help edited Figure 1, Tables 1,2,3. 

% Azam:

% - Came up with the idea that reversing the edges with IGS can solve RCA with DAG. Did not prove the reduction.
% - Wrote introduction, related work, problem formulation, and partially section 4 (known graph) and section 5 (unknown graph). I also wrote the experiment and case study section.
% - I drew the initial figure 1 to show the overall framework.
% - I integrated all versions of ikpC with the existing pipeline to run experiments on synthetic data. Further implemented PageRank and IGS.
% - Collected the data for Sockshop experiments and ran all the baselines with Sockshop data.
% - Communicated with Shubham on how to collect the Adobe dataset, Ran the experiments on that real-world dataset.

