Root Cause Analysis of Failure with Observational Causal Discovery

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: causal discovery, root cause analysis
TL;DR: We show how to use the causal graph for identifying failures in software systems.
Abstract: Finding the root cause of failures is a prominent problem in many complex networks. Causal inference provides us with tools to address this problem algorithmically to automate this process and solve it efficiently. The existing methods either use a known causal structure to identify root cause via backtracking the changes, or ignore the causal structure but rely on invariance tests to identify the changing causal mechanisms after the failure. We first establish a connection between root cause analysis and the \textit{Interactive Graph Search (IGS)} problem. This mapping highlights the importance of causal knowledge: we demonstrate that any algorithm relying solely on marginal invariance tests to identify root causes must perform at least $\Omega(\log_{2}(n) + d\log_{1+d}n)$ many tests, where $n$ represents the number of components and $d$ denotes the maximum out-degree of the graph. We then present an optimal algorithm that achieves this bound by reducing the root cause identification problem as an instance of IGS. Moreover, we show that even if the causal graph is partially known in the form of a Markov equivalence class, we can identify the root-cause with linear number of invariance tests. Our experiments on a production-level application demonstrate that, even in the absence of complete causal information, our approach accurately identifies the root cause of failures.
Primary Area: causal reasoning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11851
Loading