
\section{Introduction}
\label{sec:intro}

Causal relations are a prominent paradigm to describe our interactions with the world around us. We rely on them to make sense of notions of fairness, extrapolation, and safety in AI systems that play an increasingly important role in society. At the center of the notion of causality lies the idea of manipulation or intervention. A typical question in this context could be: ``What would happen to outcome $Y$ if $X$ were set to $x$?''. For example, a physician might be interested in how a biomarker $Y$ responds to a new dosage $x$ of drug $X$; or, an economist might wonder about the trajectory of economic indicators $Y$ under an interest rate hike $X=x$ in a given country $Z=z$. These questions all appeal to the same formal machinery, they aim at establishing facts about (conditional) \emph{causal effects}, \emph{e.g.}, written $P_{x}(y \mid z)$.

In general, it is impossible to infer the effect of interventions from data alone (without physically manipulating reality) as further domain knowledge or assumptions are typically needed to uniquely pin down causal effects. This motivates the study of a problem known as \emph{(partial) causal identification} \citep{pearl2009causality}. The idea is to combine data from an observational distribution $P(\V)$ with partial knowledge of the domain, articulated as a causal diagram, to bound a causal effect $P_{x}(y\mid z)$ within a tight interval. In other words, the problem is to infer a set of values that contains all effects implied by the causal models consistent with the data and assumptions. If the effect can be uniquely determined it is said to be point identified and the interval reduces to a single value. 

\begin{figure*}[t]
\centering
\hfill\null
\begin{subfigure}[t]{0.30\linewidth}\centering%(d)
  \begin{tikzpicture}[SCM,scale=1]
        \node (A) at (0,0) {$A$};
        \node (B) at (0,1) {$B$};
        \node (X) at (1.3,0) {$X$};
        \node (C) at (2.6,0) {$C$};
        \node (D) at (2.6,1) {$D$};

        \path [conf-path] (B) edge[out=0, in=100] (X);
        \path [->] (B) edge (X);
        \path [->] (A) edge (X);
        \path [->] (X) edge (D);
        \path [->] (X) edge (C);
        \path [->] (D) edge (C);
        \path [conf-path] (D) edge[out=-50, in=50] (C);
    \end{tikzpicture}
\caption{$\G$}
\label{fig:examples:a}
\end{subfigure}\hfill
\begin{subfigure}[t]{0.30\linewidth}\centering%(d)
  \begin{tikzpicture}[SCM,scale=1]
        \node (A) at (0,0) {$A$};
        \node (B) at (0,1) {$B$};
        \node (X) at (1.3,0) {$X$};
        \node (C) at (2.6,0) {$C$};
        \node (D) at (2.6,1) {$D$};

        \path [arrows = {Circle[fill=white]->}] (A) edge (X);
        \path [arrows = {Circle[fill=white]->}] (B) edge (X);
        \path [->] (X) edge node[above] {\scriptsize{$v$}} (D);
        \path [->] (X) edge node[above] {\scriptsize{$v$}} (C);
        \path [arrows = {Circle[fill=white]-Circle[fill=white]}] (C) edge (D);
    \end{tikzpicture}
\caption{$\1P$}
\label{fig:examples:b}
\end{subfigure}\hfill
\begin{subfigure}[t]{0.30\linewidth}\centering%(d)
  \begin{tikzpicture}[SCM,scale=1]
        \node (A) at (0,0) {$A$};
        \node (B) at (0,1) {$B$};
        \node (X) at (1.3,0) {$X$};
        \node (C) at (2.6,0) {$C$};
        \node (D) at (2.6,1) {$D$};

        \path [conf-path] (B) edge[out=0, in=100] (X);
        \path [conf-path] (A) edge[out=20, in=160] (X);
        \path [->] (X) edge node[above] {\scriptsize{$v$}} (D);
        \path [->] (X) edge node[above] {\scriptsize{$v$}} (C);
        \path [arrows = {Circle[fill=white]-Circle[fill=white]}] (C) edge (D);
    \end{tikzpicture}
\caption{$\1P_{\widetilde{X}}$}
\label{fig:examples:c}
\end{subfigure}
\hfill\null
  \caption{Examples of diagrams.}
  \label{fig:examples}
\end{figure*}

One of the foundational results in the literature is due to \cite{manski1990nonparametric,robins1989analysis}. The authors showed that causal effects could be bounded with observational data without making any assumptions on the structure or causal diagram of the underlying data generating mechanisms. This approach has since been generalized to bound causal effects under instrumental variable assumptions \citep{robins1989analysis}, and given more general causal diagrams \citep{zhang2019near,zhang2020designing}. In parallel, several authors have shown that with a sufficiently expressive parameterization of the underlying causal model, bounds can also be computed by making inference on model parameters, with recent proposals developing polynomial optimization programs \citep{balke1997bounds,hu2021generative,padh2022stochastic,li2022bounds} and Bayesian methods \citep{chickering1996clinician,zhang2021partial,finkelstein2020deriving}. Many of these recent works develop bounds under various assumptions about the structure and form of the underlying data generating mechanism. In practice, this formulation is often found too rigid for many practical applications as assumptions are hard to justify and test, sometimes even known to be unrealistic. A sensible concern is that forcing a single diagram or parametric model family may lead to false modeling assumptions and misleading inferences on causal effects.

In the spirit of designing more ``data-driven'' AI systems, one approach to circumvent the need for prior knowledge is to learn the causal diagram from data first, and then perform identification from there. The statistical constrains found in data (e.g. conditional independencies) can be leveraged to infer a class of Markov equivalent (ME) causal diagrams that is commonly represented as a Partial Ancestral Graph (PAG) \citep{richardson2002ancestral,spirtes2000causation,zhang2008completeness}. For example, the PAG $\1P$ in \Cref{fig:examples:b} encodes the ME class of causal diagrams that would be consistent with data generated according to the causal diagram $\G$ in \Cref{fig:examples:a}. In the PAG $\1P$ the directed edges encode ancestral relations, not necessarily direct, and the circle marks stand for structural uncertainty. Directed edges labeled with $v$ signify the absence of unmeasured confounders. Identification (determining whether causal effects may be uniquely computed) from PAGs is of increasing interest. Several recent techniques for the identification of causal effects have been developed \citep{zhang2008causal,zhang2008completeness,jaber2018causal,hyttinen2015calculus,perkovic2018complete,jaber2019identification,jaber2018graphical} including a calculus and complete algorithms \citep{jaber2019causal, jaber2022causal}. 

In this paper, we pursue a generalization of the partial identification task that consists of bounding causal effects with more restricted domain knowledge in the form of a class of ME causal diagrams (instead of a fully specified causal diagram).This notion is ``data driven'' in the sense that equivalence classes can, in principle, be inferred from observational data $P(\V)$ only, up to an assumption of faithfulness\footnote{In practice, an assumption of strong faithfulness is typically required for consistently recovering (asymptotically, without error) the True PAG from finite samples \citep{robins2003uniform,zhang2012strong}.}. Our main contributions is to show that the data entails constraints on the value of causal effects that can be exploited to derive tighter bounds than previously considered. More specifically, we summarize our contributions as follows.
\begin{itemize}[left=0cm]
    \item \textbf{Section~\ref{sec:partial-identification}}. We derive analytical expressions (closed-form, in terms of $P(\V)$) for lower and upper bounds on a causal effect of interest given observational data based on the structure of a PAG (\Cref{alg:partialid}). In particular, we show that the proposed bounds outperform the bounds due to \cite{manski1990nonparametric,robins1989analysis} in general (\Cref{prop:bounds_vs_natural}) and provide several examples. 
    \item \textbf{Section~\ref{sec:enumeration}}. We investigate enumeration strategies, \textit{i.e.} the strategy of listing ME causal diagrams and performing partial identification on each diagram separately using existing ``diagram-specific'' bounding techniques. We show that, in fact, a large portion of ME causal diagrams could be shown to be ``redundant'' for the purpose of bounding causal effects (\Cref{prop:expressiveness_leg,prop:expressiveness_causal_graphs_in_leg}). Despite this simplification, still, we conjecture that a large number of graphs (increasing with the number of nodes) must be considered in general (\Cref{prop:nonredundancy_causal_graphs_in_leg}), which suggests that enumeration strategies might be computationally intractable.
\end{itemize}

\subsection{Preliminaries}
\label{sec:preliminaries}

We use capital and small letters to denote random variables and their values respectively, \textit{e.g.} $X$ and $x$, and bold capital and small letters to denote sets of variables and their values, \textit{e.g.} $\X$ and $\x$. We use $P(\x)$ as an abbreviation for probability $P(\X = \x)$, and similarly for conditional probabilities. For sets of variables $\X,\Y,\Z$, conditional independence in $P$ is denoted $(\X\indep\Y\mid\Z)_P$ and $d$-separation\footnote{The criterion of $d$-separation follows \cite[Def.~1.2.3]{pearl2009causality}.} in a graph $\G$ is denoted $(\X\indep\Y\mid\Z)_\G$. %Finally, $\I \{\cdot\}$  is the indicator function that equals $1$ if the statement in $\{\cdot\}$ is true, and equal to $0$ otherwise.

The framework that underpins causal effects and diagrams rests on Structural Causal Models (SCMs) following \cite[Def. 7.1.1]{pearl2009causality}. A SCM $\M$ is a tuple $\langle \V, \U, {\cal F}, P(\U) \rangle$, where $\V$ is a set of endogenous (observed) variables, $\U$ is a set of exogenous latent variables, and $\1F=\{f_V\}_{V\in \V}$ is a set of functions such that $f_V$ determines values of $V$ taking as argument variables $\boldsymbol{Pa}_V \subseteq \V$ and $\U_V \subseteq \U$, i.e. $V \leftarrow f_{V}(\boldsymbol{Pa}_V, \U_V)$. Values of $\U$ are drawn from an exogenous distribution $P(\u)$. We assume the model to be recursive, i.e. that there are no cyclic dependencies among the variables. An intervention on a subset $\X\subset \V$, denoted by $do(\x)$, induces a sub-model $\M_{\x}$ in which $\X$ is set to constants $\x$, replacing the functions $\{f_{X}:X\in\X\}$ that would normally determine their values. The distribution of a set of variables $\Y$ in $\M_\x$ is denoted $P_\x(\Y)$. Domains of $\V$ are discrete and finite.

An SCM induces a causal diagram $\G$ over $\V$, where $V \rightarrow W$ if $V$ appears as an argument of $f_{W}$, and $V \dashleftarrow\dashrightarrow W$ if $\U_V \cap \U_W \neq \emptyset$, ($V$ and $W$ share an unobserved confounder). In a causal diagram, two nodes are said to be in the same $c$-component $\C \subseteq \V$ if and only if they are connected by a bi-directed path, \textit{i.e.}, a path composed entirely of edges ``$\dashleftarrow\dashrightarrow$''. For any set $\C \subseteq \V$, $Q[\C]:= P_{\v\backslash\c}(\c)$ denotes the post-interventional distribution of $\C$ under an intervention on $\V\backslash\C$. By definition $Q[\V] = P(\v)$ and by convention $Q[\emptyset] = 1$. 


%Interventional distributions in PAGs can be related to observational distributions by algebraic methods. \citep{jaber2022causal} extended Pearl and Zhang's calculus using the notion of $m$-separation \cite[Def. 1]{jaber2022causal}. This effort culminated in 3 rules which systematically use properties of the graph to manipulate interventional distribution expressions. These manipulations can be applied until the causal effect is reduced to something computable from $P(\V)$.