\begin{center}
    {\huge \bf Appendix}
\end{center}
\vspace{1cm}

This Appendix includes
\begin{itemize}
    \item Additional background, related work, and a discussion on the limitations of the proposed approach in \Cref{sec:app_background}.
    \item Proofs in \Cref{sec:app_proofs}.
    \item Additional examples in \Cref{sec:app_experiments}.
    \item Further discussion on enumeration strategies, including algorithms for enumerating LEGs and MDB causal diagrams, in \Cref{sec:app_enumeration_strategies}.
\end{itemize}
\vspace{1cm}

\section{Background, related work, and limitations}
\label{sec:app_background}

\subsection{Background}
In this section, we review several graphical notions that are used in the main body of this document or will be used for the derivation of proofs.

Firstly, we review manipulations in causal diagrams $\G$. Let $\G$ denote a causal diagram over $\V$ and $\X \subseteq \V$. $\G_\X$ denotes the induced subgraph of $\G$ over $\X$. The $\X$-lower-manipulation of $\G$ deletes all those edges that are out of variables in $\X$ and otherwise keeps $\G$ as it is. The resulting graph is denoted as $\G_{\underline{\X}}$. The $\X$-upper-manipulation of $\G$ deletes all those edges in $\G$ that are into variables in $\X$, and otherwise keeps $\G$ as it is. The resulting graph is denoted as $\G_{\overline{\X}}$. Further, we will use standard graph-theoretic family abbreviations to represent graphical relationships in graphs $\G$, such as parents $pa$, descendants $\de$, ancestors $\an$, and spouses $\spo$. For example, let $X\in \spo(Y)_{\G}$ if $X \dashleftarrow\dashrightarrow Y$ is present in $\G$. Capitalized versions $\Pa, \De, \An, \Spo$ include the argument as well, e.g. $\Pa(\X)_{\G} = pa(\X)_{\G} \union \X$. If $X\in \an(Y)_{\G}\inter \spo(Y)_{\G}$, we say that there is an almost directed cycle between $X$ and $Y$. 

The following rules to manipulate experimental distributions produced by an intervention are known as the do-calculus and will be used for the proof of several theoretical statements \citep{pearl2009causality}.

\begin{theorem}[Inference Rules $do$-calculus]
    \label{thm:do_calculus}
    Let $\G$ be a causal diagram compatible with an SCM $\M$, with endogenous variables $\V$. For any disjoint subsets $\X, \Y,\Z \subseteq \V$, two disjoint subsets $\Z,\W \subseteq \V \backslash (\Y \union \X)$, the following rules are valid for any intervention strategies $do(\X=\x)$, $do(\Z=\z)$:
     \begin{itemize}
         \item Rule 1 (Insertion/Deletion of observations):
         \begin{align*}
             P_{\x}(\y \mid \w, \z) = P_{\x}(\y \mid \w) \quad \text{ if }\quad (\Z \indep \Y \mid \W, \X)_{\G_{\overline{\X}}}.
         \end{align*}
         \item Rule 2 (Change of regimes):
         \begin{align*}
             P_{\x, \z}(\y \mid \w) = P_{\x}(\y \mid \z, \w) \quad \text{ if }\quad (\Y \indep \Z \mid \W, \X)_{\G_{\overline{\X},\underline{\Z}}}.
         \end{align*}
         \item Rule 3 (Insertion/Deletion of interventions):
         \begin{align*}
            P_{\x, \z}(\y \mid \w) = P_{\x}(\y \mid \w) \quad \text{ if }\quad (\Y \indep \Z \mid \W, \X)_{\G_{\overline{\X,\Z(\W)}}}.
         \end{align*}
     \end{itemize}
     where $\Z(\W)$ is the set of elements in $\Z$ that are not ancestors of $\W$ in $\G_{\overline{\X}}$. 
 \end{theorem}
 
 
Next, we consider operations on equivalence classes starting with a more complete definition of a Maximal Ancestral Graph.

\begin{definition}[Maximal Ancestral Graph]\label[definition]{def:MAG}
  \textit{A mixed graph is ancestral if it does not contain directed or almost directed cycles. It is maximal if, for every pair of nonadjacent vertices $(X, Y)$, there exists a set $\Z\subset \V$ that $d$-separates them. A Maximal Ancestral Graph (MAG) is a graph that is both ancestral and maximal.}
\end{definition}

\begin{figure*}
\centering
\begin{tikzpicture}[SCM,scale=1]
        \node (V1) at (0,0) {$V_1$};
        \node (V2) at (1,0) {$V_2$};
        \node (V3) at (2,0) {$V_3$};

        \path [conf-path] (V1) edge [out=45,in=135] (V2);
        \path [<-] (V2) edge (V3);
        
        \node (V1) at (4,0) {$V_1$};
        \node (V2) at (5,0) {$V_2$};
        \node (V3) at (6,0) {$V_3$};

        \path [->] (V1) edge (V2);
        \path [<-] (V2) edge (V3);
\end{tikzpicture}
\caption{MAG (left), LEG (right).}
\label{fig:app_mag_leg}
\end{figure*}
Given a causal graph over $\V$, a unique MAG over $\V$ can be constructed such that both independence and non-ancestral relations among $\V$ are retained; see e.g. \cite[Sec. 3]{zhang2008causal}. Two MAGs are said to be Markov equivalent if they entail the same set of $d$-separations. Among Markov equivalent MAGs, a particular subset, called Loyal Equivalent Graphs (LEG), can be constructed with the fewest bi-directed edges, all of which are invariant, \textit{i.e.} bi-directed edges in LEGs appear in all MAGs with the same $d$-separations an non-ancestral relations \cite[Corollary 18]{zhang2005characterization}, and is given in \Cref{def:LEG}. Thus, between a MAG and its LEG, only one kind of difference is possible, namely, some bi-directed edges in the MAG are oriented as directed edges in its LEG, as illustrated in \Cref{fig:app_mag_leg}. 

An important consequence of the definition of LEGs is that one can traverse the space of Markov equivalent LEGs by checking whether directed edges can be reversed with a simple criterion, restated below from \cite[Lemma 2]{zhang2012transformational}.


\begin{proposition}[Transformational characterization of LEGs]
\label{prop:LEG}
  \textit{Let $\G$ be a arbitrary LEG, and $X \rightarrow Y$ an arbitrary directed edge in $\G$. The reversal of $X \rightarrow Y$ produces a Markov equivalent LEG if and only if $\Pa(X)_{\G} = pa(Y)_{\G}$ and $\Spo(X)_{\G} = \Spo(Y)_{\G}$.}
\end{proposition}

\begin{proof}
The proof can be found in \citep{zhang2012transformational}.
\end{proof}

Two Markov equivalent LEGs can always be transformed to each other via a sequence of reversals according to \Cref{prop:LEG}. Similarly to the definition for PAGs, directed edges $X \rightarrow Y$ in a MAG are said to be visible if there exists no causal graph compatible with this MAG with an edge $X \dashleftarrow\dashrightarrow Y$, that is unobserved confounding between $X$ and $Y$ can be ruled out. Visibility of an edge can be easily determined by a graphical condition \cite[Lemma 9]{zhang2008causal}. Directed edges that are not visible are called invisible. \Cref{prop:LEG} is important because, although listing all Markov equivalent MAGs is in general infeasible, one could in principle list all Markov equivalent LEGs by checking reversal of invisible directed edges with this graphical criterion. An explicit algorithm for generating Markov equivalent LEGs is given in \Cref{sec:app_enumeration_strategies}. Finally, for completeness we reproduce the IDP algorithm \citep{jaber2019causal} for identifying causal effects from a PAG in \Cref{alg:idp}. 


\renewcommand{\algorithmicrequire}{\textbf{Input:}}
\renewcommand{\algorithmicensure}{\textbf{Output:}}

\begin{algorithm}[t]
    %\fontsize{9}{9}\selectfont
    \caption{IDP}
    \label{alg:idp}
    \begin{algorithmic}[1]
    \REQUIRE A PAG $\1P$ and disjoint sets $\X,\Y\subset\V$
    \ENSURE Expression for $P_{\x}(\y)$ or FAIL
    \STATE Let $\D := \texttt{PossAn}(\Y)_{\1P_{\V\backslash\X}}$
    \RETURN $\sum_{\d\backslash\y} \texttt{ID}(\D, \V, P)$\\\medskip
    \STATE \textbf{function} \texttt{ID}$(\C, \T, Q = Q[\T])$\\\smallskip
    \begin{ALC@g}
	\STATE if $\C = \emptyset$ then return 1.
	\STATE if $\C = \T$ then return $Q$.\\\smallskip
	/* In $\1P_\T$, let $\B$ denote a bucket, and let $\C_\B$ denote the $pc$-component of $\B$ */ \\\smallskip
	\IF{$\exists \B \subset \T \backslash \C$ such that $\C_\B \inter \texttt{PossCh}(\B)_{\1P_{\T}} \subseteq \B$}
		\STATE Compute $Q[\T \backslash \B]$ from $Q[\T]$ via \cite[Prop. 2]{jaber2018causal}.
		\RETURN $\texttt{ID}(\C,\T \backslash \B,Q[\T \backslash \B])$\smallskip
	\ELSIF{$\exists \B \subset \C$ such that $\1R_\B \neq \C$}
	    \RETURN $\texttt{ID}(\1R_{\B},\T,Q) \times \texttt{ID}(\1R_{\C \backslash \1R_\B} ,\T,Q)\hspace{0.1cm} / \hspace{0.1cm} \texttt{ID}(\1R_\B \inter\1R_{\C \backslash \1R_\B}, \T, Q)$
	\ELSE
		\RETURN FAIL \smallskip
    \ENDIF
    \end{ALC@g}
    \end{algorithmic}
\end{algorithm}



\subsection{Related Work}
We review in this section related work concerned with bounding causal effects with knowledge of fully-specified graph, as no treatment of equivalence classes has been proposed yet.

The natural bounds over the causal effects due to \cite{robins1989analysis,manski1990nonparametric} were developed with a specific focus on pairs of variables or in studies with imperfect compliance and instrumental variable assumptions. Recently their proof technique have motivated several general works extending these bounds to arbitrary causal diagrams. This was demonstrated recently by \citep{zhang2020designing,zhang2019near} in which the authors extended earlier bounding strategies to estimate system dynamics in sequential decision-making settings and causal effects in more general graphs. Our work could be interpreted to lie in this line of research, namely extending the natural bounding technique to systems characterized by Partial Ancestral Graphs.

In the partial identification literature, another line of research was pioneered by the seminal work of \citep{balke1997bounds} that employs a polynomial optimization program to compute causal bounds and are provably optimal. They proposed a family of canonical models with finite unobserved states, which sufficiently represent all observations and consequences of interventions in instrumental variable models. Based on this canonical characterization, \citep{balke1997bounds} reduced the bounding problem to a series of equivalent linear programs. \citep{chickering1996clinician} further used Bayesian techniques to investigate the sharpness of these bounds with regard to the observational sample size. Recently, \citep{zhang2021partial,finkelstein2020deriving} describe a polynomial programming approach to solve the partial identification for general causal graphs. They generalize the canonical characterization of SCMs to arbitrary graphs, although require discrete endogenous variables with small support as the time complexity of their algorithm grows exponentially with the size of the support set of variables. In continuous settings, \citep{gunsilius2019bounds} extends the linear programming approach to partial identification of instrumental variable graphs with continuous treatments. Several recent works follow a similar approach: parameterizing causal effects as a linear combinations of a set of fixed basis functions \citep{padh2022stochastic} or neural networks \citep{balazadeh2022partial,hu2021generative} and subsequently match the (moments of the) observed distribution while minimizing and maximizing causal effects.

In applications, partial identification has been used in reinforcement learning for the estimation of dynamic treatment regimes \citep{zhang2020designing,zhang2019near}, for estimating policies under safety constraints \citep{joshi2024towards}, and within bandit algorithms \citep{zhang2021bounding,bellot2024transportability}. And similarly in problems of fairness, for example by \cite{wu2019pc}.


\subsection{Limitations}
In this work, we start from the assumption that the true PAG that underlies a system of interest can be inferred from data. In general, this requires an assumption of faithfulness, \textit{i.e.} that the independencies in data imply a corresponding separation in the underlying causal diagram, and an oracle for testing for conditional independencies. Learning the true PAG from finite data can be a significant challenge in practice \citep{spirtes2000causation,robins2003uniform,zhang2012strong,bellot2022discovery,bellot2021deconfounded}. In higher-dimensional systems, the computational complexity of estimating the conditional distributions that define lower and upper bounds on causal effects is another substantial challenge. In light of this, it is important to make the distinction between the task of partial identification, that is inferring an expression to bound causal effects, and that of causal effect estimation, that is providing efficient estimators from finite samples to compute bounds in practice. This set of results is concerned with the first task (partial identification). The objective of our procedure is to decide whether the effect can be bounded and provide an expression for lower and upper bounds, while being agnostic as to whether $P(\V)$ can be accurately estimated from the available samples. Several works consider the efficient estimation of identifiable causal effects \citep{jung2021estimating}. Extending these techniques to the problem of bounding non-identifiable causal effects is an important direction for future work. Finally, we emphasize that simulations on real and synthetic data are provided for illustration purposes only. These results do not recommend or advocate for the implementation of a particular intervention, and should be considered in practice in combination with other aspects of the decision-making process. 