\section{Causal Partitions: Extended Definitions}
\label{sec:prelim_appendix}

\subsection{Partition $\z_5$: Instrumental Variables and Their Proxies}

\input{figure_tex/figure_z5_z1}


Instrumental variable methods have been used heavily in econometrics \citep{imbens_instrumental_2014} and epidemiology \citep{hernan_instruments_2006, labrecque_understanding_2018} for causal effect estimation in the presence of latent confounding. The present work explores an additional way to relate instrumental variables to the problem of confounding, where the marginal independence between some instrument-confounder pairs is exploited to detect confounders in unknown causal structures. We define an instrument as any variable that meets the  criteria enumerated in Definition \ref{def:z5}. We then claim Proposition \ref{prop:z5_z1_ind} about the relations among $\z_1$ and $\z_5$, as a theoretical basis for sufficient condition \ref{cond:sufficient_3}. Proof of Proposition \ref{prop:z5_z1_ind} follows from Propositions \ref{prop:z5_z1_collider} and \ref{prop:root_ind_z4_z5}.

\begin{definition} [Instrumental variable, \citet{lousdal_introduction_2018}] Any instrument meets the following criteria: \label{def:z5}
\begin{enumerate}[noitemsep,topsep=0pt]
    \item \textit{Relevance assumption}: The instrument is causal for exposure $X$.
    \item \textit{Exclusion restriction}: The effect of the instrument on outcome $Y$ is fully mediated by $X$.
    \item \textit{Exchangeability assumption}: The instrument and $Y$ do not share a common cause. 
\end{enumerate}
\end{definition}

\begin{proposition}
Any instrument (or proxy) $Z_5 \in \z_5$ will meet the following criteria with respect to at least one confounder (or proxy) $Z_1 \in \z_1$ on every backdoor path in $\g$.
\begin{enumerate}[noitemsep,topsep=0pt]
    %\item Both $X$ and $Y$ are marginally dependent on $Z_5$.
    \item $Z_5$ and $Z_1$ are marginally independent.
    \item $Z_5$ and $Z_1$ are conditionally dependent given $X$.
\end{enumerate}
    \label{prop:z5_z1_ind}
\end{proposition} 


%%%%%%%%%%%%%

\subsection{Partition $\z_4$} 

To our knowledge, partition $\z_4$ has been significantly less characterized and less utilized in the causal inference literature than confounders ($\z_1$), colliders ($\z_2$), mediators ($\z_3$), and instrumental variables ($\z_5$). Limited reference has been made to members of this partition under the term \textit{pure prognostic variables} \citep{hahn_feature_2022}. We elaborate on our definition of $\z_4$ below.

\begin{definition}[Partition $\z_4$] Partition $\z_4$ encompasses all non-descendants of $Y$ that are marginally dependent on $Y$ but marginally independent of $X$ (Table \ref{tab:partitions}). Given this definition, we observe that any $Z_4 \in \z_4$ participates in a $v$-structure $X \cdots \rightarrow Y \leftarrow \cdots Z_4$. This implies the following:
    \begin{enumerate}[noitemsep,topsep=0pt]
    \item $X$ cannot share active paths with any $Z_4$. Thus, $X$ can share no common causes with any $Z_4$.
    \item $\z_4$ is conditionally dependent on $X$ given $Y$. This implicitly requires that the $X$ and $Y$ under consideration are marginally dependent (an assumption made in Section \ref{sec:identifiability}), though they may not be directly adjacent in $\g$.
\end{enumerate}
\label{def:z4}
\end{definition}

\subsection{Additional Propositions on $\z_4$ and $\z_5$}

Here, we introduce several propositions that describe the properties of $\z_4$ and $\z_5$ in relation to each other and to $\z_1$. Let $\mathcal{P}$ be a backdoor path in $\g$. 

\begin{proposition}
    If a $Z_4 \in \z_4$ shares an active path with any $Z_1 \in \z_1$ on $\mathcal{P}$ such that $Z_4 \nind Z_1$, that $Z_4$ must form a $v$-structure $Z_4 \cdots \rightarrow Z_1 \leftarrow \cdots Z_1'$, where $Z_1'$  lies between $Z_1$ and $X$ on $\mathcal{P}$. If not, $Z_4$ would share an active path with $X$, which violates the definition of $\z_4$ (Definition \ref{def:z4}). In Figure \ref{fig:z4_z5_z1_paths} (right-hand DAG), examples include $Z_4 \rightarrow Z_1^3 \leftarrow Z_1^2$ and $Z_4 \rightarrow Z_1^3 \leftarrow Z_1^5$. Together with Definition \ref{def:z4}, this proposition implies that no $Z_4$ will ever be marginally dependent on a $Z_1$ that is directly adjacent to $X$. \label{prop:z4_z1_collider}
\end{proposition}

\begin{proposition}
    If a $Z_5 \in \z_5$ shares an active path with any $Z_1 \in \z_1$ on $\mathcal{P}$ such that $Z_5 \nind Z_1$, that $Z_5$ must form a $v$-structure $Z_5 \cdots \rightarrow Z_1 \leftarrow \cdots Z_1'$, where $Z_1'$  lies between $Z_1$ and $Y$ on $\mathcal{P}$. If not, $Z_5$ would share an active path with $Y$, which violates the definition of $\z_5$ (Definition \ref{def:z5}). In Figure \ref{fig:z4_z5_z1_paths} (right-hand DAG), examples include $Z_5 \rightarrow Z_1^1 \leftarrow Z_1^2$ and $Z_5 \rightarrow Z_1^1 \leftarrow Z_1^4$. Together with Definition \ref{def:z5}, this proposition implies that no $Z_5$ will ever be marginally dependent on a $Z_1$ that is directly adjacent to $Y$.\label{prop:z5_z1_collider} 
\end{proposition}

\begin{proposition} [A single $Z_1 \in \z_1$ cannot be a collider for a $Z_4 \in \z_4$ and a $Z_5 \in \z_5$] \label{prop:z4_z5_z1_collider}
    %Given Propositions \ref{prop:z4_z1_collider} and \ref{prop:z5_z1_collider}, any backdoor path containing a $Z_1$ that shares an active path with either a $Z_4$ or a $Z_5$ must also contain a $Z_1$ that acts as a collider such that that the paths from $Z_4$ to $X$ and from $Z_5$ to $Y$ are inactive. 
    If a single $Z_1$ was a collider for $Z_4$ and $Z_5$, then $Z_4$ would share an active path with $X$ and $Z_5$ would share an active path with $Y$, violating the definitions of these partitions. %This observation also offers one justification for the forbidden active path between $Z_1^5$ and $Z_1^7$ in Figure \ref{fig:z4_z5_z1_paths} (left-hand DAG).
\end{proposition}

Next, we introduce the concepts of \textit{root}-$Z_1$ and \textit{collider}-$Z_1$. We observe that every backdoor path features a $Z_1$ that acts as a \textit{root} node for that path: i.e., it is a common cause for $\{X,Y\}$ and all $Z_1$ that are its descendants on the paths to $X$ and $Y$. In Figure \ref{fig:z4_z5_z1_paths}, $\{Z_1^1, Z_1^3, Z_1^6\}$ are roots for backdoor paths in the left-hand DAG while $\{Z_1^2, Z_1^4, Z_1^5\}$ are roots for backdoor paths in the right-hand DAG. When multiple backdoor paths in $\g$ overlap (i.e., share subpaths), some $Z_1$ can behave as \textit{colliders} for two parents in $\z_1$. In Figure \ref{fig:z4_z5_z1_paths}, $\{Z_1^2, Z_1^4\}$ are \textit{collider}-$Z_1$ on overlapping backdoor paths in the left-hand DAG while $\{Z_1^1, Z_1^2, Z_1^3\}$ are \textit{collider}-$Z_1$ for backdoor paths in the right-hand DAG. Note that node $Z_1^2$ in the right-hand DAG simultaneously behaves as a \textit{root}-$Z_1$ and a \textit{collider}-$Z_1$ for different backdoor paths. Thus, $Z_1^2$ is not a \textit{true root} in the classical graph theory sense of having no parents.

\begin{proposition}[The \textit{root}-$Z_1$ of a backdoor path will never be marginally dependent on a $Z_4$ nor a $Z_5$] \label{prop:root_ind_z4_z5} As all \textit{root}-$Z_1$ are causal for both $X$ and $Y$, marginal dependence on either a $Z_4$ or a $Z_5$ would violate Propositions \ref{prop:z4_z1_collider}, \ref{prop:z5_z1_collider}, and \ref{prop:z4_z5_z1_collider}.
\end{proposition}

\input{figure_tex/figure_z4_z5_z1_paths}

%%%%%%%%%%

\subsection{Proxy variables}

Multiple causal partitions defined in this work include notions of \textit{proxy variables}. These proxies are conceptually related to previously described proxy variables in the causal literature, though they may depart in some ways. Firstly, the path types enumerated in Table \ref{tab:path_types} allow for proxies of confounders to be classified as $\z_1$. A descendant proxy can act as a noisy stand-in for its respective confounder \citep{pearl2012measurement}, and adjusting for this proxy when the confounder is unobserved can theoretically reduce confounding bias (though this is not guaranteed for all cases) \citep{vanderweele_principles_2019}. Likewise, proxy instruments are a notable variable type in the instrumental variable literature that falls under our definition of $\z_5$ (Figure \ref{fig:z5_z1}). We generalize the notion of a proxy here to refer to any member of $\z_1$ that does not lie on a backdoor path (and thus cannot fully block it), as well as the analogue for $\z_2$ and $\z_3$. For the purposes of this work, the proxy and the variable that it proxies will generally both be observed, though the literature explores cases where the proxied variable is unobserved \citep{wang2021proxy}.

\input{figure_tex/figure_proxies}

\begin{definition}[Proxy variables in $\z_1$, $\z_2$, and $\z_3$] \label{def:proxy}
    A proxy variable for $\z_1$, $\z_2$, or $\z_3$ is a member of these partitions that is an ancestor or descendant of another member of its respective partition, such that the proxy is not strictly a confounder, mediator, or collider, but still satisfies the allowable path types for its respective partition (as defined in Table \ref{tab:path_grid}). This includes members of $\z_3$ that are not directly on mediator chains but are descended from $\z_3$ that lie on mediator chains, members of $\z_1$ that are not on backdoor paths but are ancestral to $\z_1$ on backdoor paths, etc. (Figure \ref{fig:proxies}).
\end{definition}


