

\section{Problem formulation}

In the setting of causal inference, we are interested in the following causal effect.

\begin{definition}[Causal effect]
    The causal effect from an intervention $do(\X=\x)$ on an outcome $\Y$ is defined by $P_{\x}(\y)$.
\end{definition}

The challenge is that we cannot immediately use this expression to estimate the causal effect as we only have access to the observational distribution $P$ but not the experimental distribution $P_{\x}$ that would define its value. In general, there might exist multiple SCMs $\M$ that entail the same data distribution $P(\V)$ that result in different values of the causal effect $P_{\x}(\y)$ (regardless of how many samples are collected). This motivates the problem of partial identification defined next.

\begin{definition}[Partial Identification]
    \label{def:partial_identification}
    The causal effect $P_{\x}(\y)$ is said to be partially identifiable from $P(\V)$ if it determines a bound $[a, b]$ for $P_{\x}(\y)$ that is strictly contained in $[0,1]$ and valid over all SCMs $\M$ that induce $P$.
\end{definition}

We now introduce the so-called \textit{natural bounds} (NB) due to \citet{manski1990nonparametric,robins1989analysis} that define a function of the observational data that consistently bounds $P_{\x}(\y)$, irrespective of the causal structure of the system.

\begin{definition}
    The natural bounds (NBs) for a causal effect $P_{\x}(\y)$ are given by,
    \begin{align}
    \label{eq:natural_bounds}
     P(\x,\y) \leq P_{\x}(\y)\leq P(\x,\y) + 1 - P(\x).
\end{align}
\end{definition}

In words, this result states that causal effect are naturally partially-identifiable. In particular, the NBs have been shown to be tight in several examples (in the sense that there exists two different models $\M^1,\M^2$ that entail $P(\V)$ and evaluate to the lower and upper NBs, respectively). One example is the query $P_b(x)$ given $\G$ in \Cref{fig:examples:a} for which the NBs are tight. 

For other queries that involve variables that are more ``separated'' in the underlying causal system, better bounds may be derived by exploiting the implications of ``separation'' on the entailed observational and interventional data distributions. For example, we would expect that if $(\Z \indep \Y)_P$ then $P_{\x,\z}(\y)=P_{\x}(\y)$ also and therefore the NBs could be improved. Statistical constraints of this type are an implication of the structure of the underlying causal system onto the observed data with distribution $P(\V)$. More generally, a $d$-separation between nodes in a causal diagram induces a corresponding conditional independence between variables in $\V$. The reverse implication, \textit{i.e.} that each conditional independence in data implies a corresponding $d$-separation in the underlying causal diagram, is known as faithfulness. In particular, for three sets of variables $\X,\Y,\Z$ with a distribution $P(\X,\Y,\Z)$ induced by a causal model with causal diagram $\G$, faithfulness asserts that,
\begin{align*}
    (\X\indep\Y\mid\Z)_P \Rightarrow (\X\indep\Y\mid\Z)_\G.
\end{align*}
This condition serves as a statistically testable constraint to narrow the class of compatible causal models \citep{pearl:88a,meek:1995,zhang2006causal}. In this paper, we ask whether tighter bounds could be inferred under the assumption of faithfulness.


