\documentclass[accepted]{uai2024}  % for initial submission
%\documentclass[accepted]{uai2024} % after acceptance, for a revised version; 
% also before submission to see how the non-anonymous paper would look like 
                        
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2024} % ptmx math instead of Computer
                                         % Modern (has noticeable issues)
% \documentclass[mathfont=newtx]{uai2024} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

\usepackage{algorithm}
\usepackage{algorithmic}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{appendix}
\usepackage{xr}
\usepackage[labelformat=simple]{subcaption}
\renewcommand\thesubfigure{(\alph{subfigure})}

\usepackage{hyperref}
\hypersetup{
    colorlinks=true,
    linkcolor=blue,
    filecolor=magenta,      
    urlcolor=cyan,
    citecolor=violet
    }

\newtheorem{theorem}{Theorem}
\newtheorem{corollary}{Corollary}[theorem]


\DeclareMathOperator*{\inprob}{\stackrel{P}{\longrightarrow}}
\DeclareMathOperator*{\inproblow}{\rightarrow_{p}}
\DeclareMathOperator*{\inoutprob}{\stackrel{P^*}{\longrightarrow}}
\DeclareMathOperator*{\indist}{\stackrel{d}{\longrightarrow}}
\DeclareMathOperator*{\indistlow}{\rightarrow_{$d$}}
\DeclareMathOperator*{\bounded}{O_P}
\DeclareMathOperator*{\fasterthan}{o_P}
\DeclareMathOperator*{\fasterthandet}{o}
\DeclareMathOperator*{\boundeddet}{O}
\DeclareMathOperator*{\Iff}{\textrm{iff}}


\newcommand\ci{\perp\!\!\!\perp}
\newcommand\bias[1]{{\bf 1}_{ #1 }}
\newcommand{\G}{{\mathcal G}}
\newcommand{\I}{{\mathbb I}}
\newcommand{\quotient}[2]{\ensuremath{{}^{#1}\!/\!_{#2}}}
\newcommand{\E}{\mathbb{E}}

\renewcommand{\b}[1]{\mathbf{#1}}
\newcommand{\bs}[1]{\boldsymbol{#1}}
\newcommand{\s}[1]{\mathcal{#1}}
\renewcommand{\d}[1]{\mathbb{#1}}
\newcommand{\h}[1]{\hat{#1}}
\renewcommand{\t}[1]{\tilde{#1}}
\newcommand{\n}[1]{\mathrm{#1}}

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{Statistical and Causal Robustness for Causal Null Hypothesis Tests}

% The standard author block has changed for UAI 2024 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is automatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1,*]{Junhui Yang}
\author[2,*]{Rohit Bhattacharya}
\author[3]{Youjin Lee}
\author[1]{Ted Westling}
% Add affiliations after the authors
\affil[1]{%
    Department of Mathematics and Statistics\\
    University of Massachusetts Amherst\\
    Amherst, Massachusetts, USA
}
\affil[2]{%
    Department of Computer Science\\
    Williams College\\
    Williamstown, Massachusetts, USA
}
\affil[3]{%
    Department of Biostatistics\\
    Brown University\\
    Providence, Rhode Island, USA
  }
\affil[*]{%
    Equal contributors
  }
  
  \begin{document}
\maketitle

\begin{abstract}
  Prior work applying semiparametric theory to causal inference has primarily focused on deriving estimators that exhibit statistical robustness under a prespecified causal model that permits identification of a desired causal parameter. However, a fundamental challenge is correct specification of such a model, which usually involves making untestable assumptions. Evidence factors is an approach to combining hypothesis tests of a common causal null hypothesis under two or more candidate causal models. Under certain conditions, this yields a test that is valid if at least one of the underlying models is correct, which is a form of causal robustness. We propose a method of combining semiparametric theory with evidence factors. We develop a causal null hypothesis test based on joint asymptotic normality of $K$ asymptotically linear semiparametric estimators, where each estimator is based on a distinct identifying functional derived from each of $K$ candidate causal models. We show that this test provides both statistical and causal robustness in the sense that it is valid if at least one of the $K$ proposed causal models is correct, while also allowing for slower than parametric rates of convergence in estimating nuisance functions. We demonstrate the effectiveness of our method via simulations and applications to the Framingham Heart Study and Wisconsin Longitudinal Study.
\end{abstract}

\section{INTRODUCTION}

Prior work at the intersection of semiparametric theory and causal inference  has primarily focused on deriving estimators that possess statistical robustness properties under a prespecified causal model that permits identification of a causal parameter of interest. For example, in the backdoor causal model where treatment assignment is assumed to be ignorable given observed covariates, the average causal effect (ACE) is identified via the backdoor formula \citep{robins1986new, pearl2009causality}, and the augmented inverse probability weighted estimator (AIPW) of this parameter \citep{bang2005doubly} exhibits  statistical robustness  to specification of the propensity score and outcome regression estimators. In particular, the AIPW  estimator is \emph{doubly robust}, meaning that it is consistent if either the propensity score or outcome regression estimator is consistent, and it can attain the parametric $n^{-1/2}$ rate of convergence to the true ACE even when using data-adaptive estimators of the propensity score and outcome regression that may have convergence rates slower than $n^{-1/2}$. General semiparametric estimation strategies with similar robustness properties have been derived in settings where the causal model is represented as a causal graph with latent confounders \citep{fulcher2020robust, jung2021estimating, bhattacharya2022semiparametric}. However, valid causal interpretation of these semiparametric estimators relies on correct specification of the causal model. Furthermore, causal models typically include assumptions that are untestable using the observed data, and which can only be justified using scientific arguments---classic examples are the conditional ignorability assumption in the backdoor model and the exclusion restrictions in the instrumental variable (IV) \citep{balke1993ivbounds, angrist1996identification}  and front-door models \citep{pearl1995causal}.

In some cases, there are multiple plausible causal models identifying a causal effect in a single observed dataset. For example, the data may contain a set of covariates for which conditional ignorability is plausible, and also contain a plausible IV. Evidence factors is an approach to combining hypothesis tests of a common causal null hypothesis under two or more candidate causal models \citep{rosenbaum2010evidence, rosenbaum2011approximate, karmakar2019integrating}. Under certain conditions, evidence factors methodology yields a test that is valid if at least one of the underlying causal models is correct, without knowing which of the models is correct. This is a form of \emph{causal robustness} because the test is robust to misspecification of some of the causal models as long as one is correctly specified. This approach allows the analyst to make weaker causal assumptions at the expense of stronger statistical assumptions, since a well-behaved statistical test must be constructed using \emph{each} posited causal model. %This is analogous to how doubly robust estimators are consistent if at least one nuisance estimator is consistent.

In this paper, we propose methods for combining semiparametric theory with evidence factors to produce tests that exhibit both statistical and causal robustness. Our proposed approach is built upon the evidence factors design, where multiple analyses are used to test a common causal null hypothesis using a single dataset. We propose tests based on joint asymptotic normality of multiple asymptotically linear semiparametric estimators, where each estimator is based on a distinct identifying functional derived from a (possibly incorrect) causal model. We show our tests have asymptotically valid type I error rate if at least one of the causal models is correct. 

\textbf{Advantages of our method:} Our tests have several advantages over existing evidence factors methods, including relaxing some of the conditions required by standard evidence factors designs \citep{rosenbaum2010evidence, rosenbaum2011approximate, rosenbaum2021replication}. 
\begin{enumerate}
    \item[(i)] Since our tests are based on semiparametric estimators, they possess the types of statistical robustness discussed above.
    \item[(ii)] We remove the need to demonstrate that the joint distribution of the p-values from multiple tests stochastically dominates the uniform distribution under the null, which is commonly used to demonstrate that the combined p-value from an evidence factors analysis has valid size under the null. Asymptotic validity of our test is guaranteed by joint convergence in distribution of the estimators, which is a consequence of asymptotic linearity of semiparametric estimators.
    \item[(iii)] Finally, our method does not require that the candidate causal models have non-overlapping sources of bias. In other words, our test is valid even if the assumptions of two or more of the candidate causal models are invalidated by the same source of bias; e.g., the same unmeasured confounder.
\end{enumerate}
% First, . Second,  Finally,  %Hence, validity of our method follows once the estimators have been shown to be asymptotically linear, which is a common task in the analysis of semiparametric estimators anyway. 
% Second,  \textcolor{blue}{fix: first, second phrasing}

% Our tests have several advantages over existing evidence factors methods. Since our tests are based on semiparametric estimators, they possess the types of statistical robustness discussed above. Second, we demonstrate that our approach allows us to relax the conditions required by standard evidence factor designs \citep{rosenbaum2010evidence, rosenbaum2011approximate, rosenbaum2021replication}.  First, it removes the need to demonstrate that the joint distribution of the p-values from multiple tests stochastically dominates the uniform distribution under the null, which is commonly used to demonstrate that the combined p-value from an evidence factors analysis has valid size under the null. Asymptotic validity of our test is guaranteed by joint convergence in distribution of the estimators, which is a consequence of asymptotic linearity of semiparametric estimators. %Hence, validity of our method follows once the estimators have been shown to be asymptotically linear, which is a common task in the analysis of semiparametric estimators anyway. 
% Second, our method does not require that the candidate causal models have non-overlapping sources of bias. \textcolor{blue}{fix: first, second phrasing}

The weaker conditions of our proposed approach allow us to readily apply our method to complex settings. We illustrate this with two examples that have not been studied before to the best of our knowledge. In the first example, we consider three candidate causal models: backdoor, front-door, and IV. In the second example, we consider three candidate backdoor models with different adjustment sets. 
We evaluate the effectiveness of our proposed test using simulations. We then demonstrate our method with two real-world applications. First, we study the effect of smoking on blood glucose levels using data from the Framingham Heart Study \citep{kannel1968framingham} by combining analyses from a backdoor, front-door and IV model. Finally, we compare our methods with evidence factors analysis using the Wisconsin Longitudinal study \citep{karmakar2021reinforced}.

\textbf{Other related work:} In addition to the evidence factors work cited earlier, we note that \citet{sun2021multiply} proposed a multiply robust method for estimating causal effects in a Mendelian randomization setting. Their work is specific to a setting where the candidate models are all IV models. An advantage of our work is that it can be applied in settings where the candidate models are qualitatively distinct. We  also note that there is prior research on specification testing for causal models---e.g., \citet{entner2013data} and \citet{shah2021finding} proposed tests for conditional ignorability models, \citet{bhattacharya2022testability} proposed tests for front-door models, and \citet{pearl1995causal} and \citet{wang2017falsification} proposed tests for IV models. %These tests either rely on collecting additional data outside of the data used for identification or yield valid falsification but not confirmation of the assumptions.
In contrast, we do not aim to test the specification of causal models. Instead, our goal is to test a causal null hypothesis provided that assumptions of at least one of the underlying causal models hold, without knowing which set of assumptions holds. %Further, the authors do not explore implementing tests that exhibit statistical robustness.

\section{MOTIVATING EXAMPLE}\label{sec:motivating}

We first describe an empirical example to motivate our general theory and methods. We present the results of data analysis for this example in Section~\ref{sec:data}. We are interested in testing the causal null hypothesis that there is no average causal effect (ACE) of smoking on glucose levels because high glucose levels are a cause of diabetes. We use data from the Framingham Heart Study \citep{kannel1968framingham} to test this null hypothesis. The data are observational, and consist of $n = 3477$ fully observed realizations of the data structure $O=\{C, Z, A, M, Y\}$, and we will assume these data are independent and identically distributed from a distribution $P$. In this data, $C$ denotes a set of baseline covariates containing age, sex, BMI, past history of heart disease, and past glucose level; $A$ is binary current smoking status, which is our treatment of interest; $Y$ is glucose level, which is our continuous outcome of interest; $M$ is hypertension, which is our candidate mediator;  and $Z$ is past hypertension, which is our candidate IV. We define the ACE of smoking on glucose as $\beta = \E[Y(A = 1)] - \E[Y(A = 0)]$, where $Y(A = 1)$ and $Y(A = 0)$ denote potential outcomes under assignment to smoking and no smoking, respectively. Our causal null hypothesis is $H_0: \beta = 0$.

%Since both potential outcomes are not observed for any individual, 

\begin{figure}[t]
		\begin{center}
			\scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
					
					\begin{scope}[]
						\path[->, very thick]
						node[] (a) {$A$} 
						node[right of=a] (y) {$Y$}
						node[above left of=a, xshift=1cm] (c) {$C$}
                            node[above right of=y, xshift=-1cm] (u) {$U$}
						node[below of=a, yshift=0.8cm, xshift=1cm] (label) {${\cal M}_1$}
						
						(c) edge[blue] (a)
						(a) edge[blue] (y)
						(c) edge[blue] (y)
                            (u) edge[red, dashed] (y)
                            (u) edge[red, dashed] (a)
						;
					\end{scope}
					
					% \begin{scope}[xshift=5.5cm]
                    \begin{scope}[xshift=4.25cm]   
						\path[->, very thick]
						node[] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (u) {$U$} 
                            node[below of=m, yshift=0.8cm](label) {${\cal M}_2$}
						
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (u) edge[blue] (a)
                        (u) edge[red, dashed] (m)
                        (u) edge[blue] (y)
                        (a) edge[red, dashed, bend right=25] (y)
						;
					\end{scope}
\end{tikzpicture}
}
\scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
     
					\begin{scope} %[yshift=-3.2cm, xshift=1.7cm] %[xshift=10cm]
						\path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (y) {$Y$}
                            node[above right of=a, xshift=-1.4cm] (u) {$U$} 
						node[below of=a, yshift=0.8cm] (label) {${\cal M}_3$}
						
						(z) edge[blue] (a)
						(a) edge[blue] (y)
                            (u) edge[blue] (a)
                            (u) edge[blue] (y)
                            (u) edge[red, dashed] (z)
                            (z) edge[red, dashed, bend right=25] (y)
						
						;
					\end{scope}

     %                \begin{scope}[xshift=16cm]
					% 	\path[->, very thick]
					% 	node[] (a) {$A$} 
					% 	node[right of=a] (y) {$Y$}
					% 	node[above left of=a, xshift=1cm] (c) {$C$}
     %                        node[above right of=y, xshift=-1cm] (u) {$U$}
					% 	node[below of=a, yshift=0.8cm, xshift=1cm] (label) {${\cal M}_4$}
     %                    node[above of=label, yshift=-1.2cm] (ay) {$2$}
     %                    node[right of=y, xshift=-1.5cm, yshift=0.5cm] (uy) {$-2$}
     %                    node[left of=u, xshift=0.9cm, yshift=-0.35cm] (ua) {$1$}
     %                    node[left of=a, xshift=+1.5cm, yshift=0.5cm] (ca) {$3$}
     %                    node[right of=c, xshift=-0.9cm, yshift=-0.35cm] (cy) {$4$}
						
					% 	(c) edge[blue] (a)
					% 	(a) edge[blue] (y)
					% 	(c) edge[blue] (y)
     %                        (u) edge[red] (y)
     %                        (u) edge[red] (a)
					% 	;
					% \end{scope}
				
				\end{tikzpicture}
			}
		\end{center}
		\caption{Plausible causal models and violations of their assumptions (shown via red dashed edges). ${\cal M}_1$ is a backdoor model, ${\cal M}_2$ is a front-door model, and  ${\cal M}_3$ is an IV model. }
		\label{fig:back-front-door}
\end{figure}

Identification of the causal parameter $\beta$ using the distribution of the observed data relies on assumptions encoded in a causal model.
Here, as is often the case, there are multiple plausible causal models. Figure~\ref{fig:back-front-door} shows three plausible causal models for this study using causal directed acyclic graphs (DAGs) \citep{spirtes2000causation, pearl2009causality}. Each causal DAG only includes the subset of variables important for identification. Solid blue edges represent causal relations that are permitted by the model---i.e., do not violate its identifying assumptions if present in the underlying data generating process---and red dashed edges represent causal relations that are not permitted by the model. 

Model ${\cal M}_1$ is a model that assumes the smoking-glucose relationship is unconfounded given the observed covariates $C$. Under ${\cal M}_1$, the ACE $\beta$ is identified with the observed data parameter $\psi_{1}(P)$  given by the backdoor  formula
\begin{align}
    \psi_{1,P} &= \E \left[ \mu(1, C) - \mu(0,C) \right],\label{eq:bdoor} 
\end{align}
where $\mu(a, c) := \E(Y \mid A = a, C =c)$ \citep{robins1986new, pearl1995causal}. For brevity, we will  refer to models like ${\cal M}_1$ that permit identification via the backdoor formula as backdoor models. Model ${\cal M}_2$ is a front-door model that assumes that smoking only impacts glucose through its effect on hypertension, but permits unmeasured common causes of smoking and glucose (but not hypertension). Under ${\cal M}_2$, the ACE is identified with the parameter $\psi_{2,P}$  given by
\begin{align}
\label{eq:fdoor}
\begin{split}
\psi_{2,P} &:=\E \left\{ \E \left[ \gamma(M, C) \mid A = 1, C\right] \right.  \\
&\qquad \qquad - \left.\E \left[ \gamma(M, C) \mid A = 0, C\right] \right\},
\end{split}
\end{align}
where $\gamma(m,c) := \E[ \mu(m, A, c)  \mid C = c]$  for $\mu(m,a,c) := \E(Y \mid M = m, A=a, C = c)$ \citep{pearl1995causal}. Model ${\cal M}_3$ is an IV model that assumes prior hypertension is exogenous and only impacts glucose through its effect on smoking, but permits unmeasured common causes of smoking and glucose (but not previous hypertension).  Under ${\cal M}_3$, the ACE is identified with $\psi_{3,P}$ given by
\begin{align}
    \psi_{3,P} &= \frac{\E(Y\mid Z=1) - \E(Y\mid Z=0)}{\E(A\mid Z=1) - \E(A\mid Z=0)} \label{eq:iv}
\end{align}
\citep{balke1993ivbounds, angrist1996identification}. We note that each causal model above also includes non-graphical assumptions, such as positivity, for identification to hold. These will be stated in Section~\ref{sec:numerical}.

Semiparametric estimators that exhibit robustness to nuisance estimation have been developed for $\psi_{1,P}$, $\psi_{2,P}$, and $\psi_{3,P}$. For example, the AIPW estimator of $\psi_{1,P}$ \citep{bang2005doubly} is doubly robust with respect to estimators of the outcome regression and propensity score, the augmented primal IPW estimator of $\psi_{2,P}$ \citep{fulcher2020robust, bhattacharya2022semiparametric} is doubly robust with respect to estimators of the outcome regression and  conditional distribution of the mediator, and the empirical plug-in estimator of $\psi_{3,P}$ does not require any nuisance estimators.

If the assumptions in causal model $\mathcal{M}_k$ hold, a semiparametric estimator of the corresponding identified parameter $\psi_{k,P}$ can be used to construct a statistically robust hypothesis test of the causal null hypothesis that $\beta = 0$. For example, if the backdoor model $\mathcal{M}_1$ holds, then a hypothesis test based on the AIPW estimator will have power tending to one as long as either the outcome regression or propensity score estimator is consistent, and will have asymptotically valid type I error rate as long as the outcome regression and propensity score estimators achieve sufficient rates of convergence (which may be slower than $n^{-1/2}$). 

If the causal model $\mathcal{M}_k$ fails to hold, a hypothesis test based on $\psi_{k,P}$ may not provide any information about whether $\beta = 0$. Furthermore, it is often the case that some of the assumptions in a causal model do not imply any testable constraints on the observed data distribution. Indeed, in all three models proposed in Figure~\ref{fig:back-front-door}, the absence of the red dashed edges is untestable. The IV assumptions can be falsified via an inequality constraint, but not confirmed \citep{pearl1995testability, wang2017falsification}. Therefore, the plausibility of causal models typically relies on substantive arguments. In the context of observational studies, such substantive arguments are frequently tenuous. For example, health consciousness is an unmeasured covariate in the Framingham Heart Study that could impact the likelihood of smoking and impact glucose levels through its effect on diet and exercise. If so, the backdoor model $\mathcal{M}_1$ may not hold. It is also possible that smoking impacts glucose through mechanisms other than hypertension, such as reduced likelihood of exercising or reduced appetite, which would invalidate the front-door model $\mathcal{M}_2$. Finally, past history of hypertension may not be exogenous, because diet and exercise may be associated with both hypertension and glucose, which would invalidate the IV model $\s{M}_3$.

%We call the robustness offered by these  estimators \emph{statistical robustness}.
 %It is exactly these assumptions, represented by the absence of the red dashed edges in Fig.~\ref{fig:back-front-door}, that are the focus of intense scrutiny in causal analyses \citep{koller2009probabilistic, imbens2020potential}.

To relax the reliance on a single causal model, evidence factors  can be used to derive a test of the null hypothesis $H_0 : \beta = 0$ that is valid as long as at least one of $\s{M}_1$, $\s{M}_2$, or $\s{M}_3$ is true, without knowing which is true. This is a form  of \emph{causal robustness}. Evidence factors typically require that the joint distribution of the individual p-values stochastically dominates the uniform distribution under the null. In our approach, asymptotic validity of the test is instead guaranteed by joint convergence in distribution of the estimators, which follows directly from asymptotic linearity of semiparametric estimators. In addition, standard evidence factors analyses require that the source of bias that invalidates one causal model does not necessarily bias other causal models. For example, the presence of an unmeasured confounder $U$ that also causes past history of hypertension, such as health consciousness, biases $\mathcal{M}_3$ and may bias $\mathcal{M}_{1}$ as well unless  the backdoor paths through $Z$ and $U$ are  blocked by $C$. Previous evidence factors literature has used blocking or stratification to preclude such cases \citep{zhao2022evidence, karmakar2021reinforced}, which can reduce effective sample size and statistical power. Our proposed approach relaxes this condition and allows one source of bias to potentially invalidate multiple analyses, and adds statistical robustness to the analyses as described above.


% In the next section, we formally define our setting and propose a semiparametric approach to evidence factors analysis that adds statistical robustness and relaxes the requirements of evidence factors analyses.

\section{METHOD FOR COMBINING EVIDENCE FACTORS}\label{sec:method}

We propose a new method for combining evidence factors that takes advantage of the asymptotic linearity of influence function-based estimators. We first outline our formal problem setup. Let $\beta$ denote the causal parameter of interest, such as the average causal effect or conditional average causal effect. The causal null hypothesis of interest is $H_0 : \beta = 0$. % denote a causal null hypothesis expressed in terms of a   For simplicity of exposition and to make the theoretical discussion more concrete, we will define the null hypothesis in terms of the ACE, as in our motivating example. Thus, our null hypothesis $H_0$ is $\beta \coloneqq \E[Y(a_1)] - \E[Y(a_0)] = 0$. The strong causal null of no individual-level causal effects implies an ACE of 0. This can be tested against a single or two-sided alternative $H_A$. 
We assume the observed data consists of $n$ realizations $O_1, \dots, O_n$ drawn IID from an unknown distribution $P$. %, which is known to lie in a statistical model $\s{P}$. %We denote by $\d{P}_n$ the empirical distribution of the data.% and for any $P$-integrable function $f$, we denote $P f := \int f \, dP$. 
%We denote expectations with respect to $P$ using $\E$. Throughout, we use $Y(X = x)$ to denote the potential outcome under assignment of random variable $X$ to value $x$.
We suppose that the analyst is considering $K > 1$ causal models $\s{M}_1, \dotsc, \s{M}_K$, and that $\psi_{k,P}$ is an identifying functional for $\beta$ under $\s{M}_k$. That is, if the assumptions of ${\cal M}_k$ are true, then $\beta= \psi_{k,P}$, which further implies that $H_0$ holds if and only if $\psi_{k,P}=0$. Hence, if at least one of the causal models $\s{M}_1, \dotsc, \s{M}_K$ is true, then under the causal null hypothesis $H_0$, at least one of $\psi_{1,P}, \dotsc, \psi_{K,P}$ is zero. Equivalently, if at least one of the $K$ causal models is true, then $H_0$ implies that $\prod_{k=1}^K \psi_{k,P} = 0$. This  motivates our approach to testing $H_0$. We note that the reverse implication is not necessarily true; we discuss this further later in this section.  %However, for our tests to have non-trivial power, we make an additional assumption; the generating distribution $P$ is such that if $\psi_{k,P}=0$ for any $k\in \{1, \dots, K\}$ then $H_0$ is true. If this reverse implication does not hold, i.e., if we allow distributions $P$ where $\psi_{k,P} = 0$ but $H_0$ is false, our tests would have trivial power for such distributions. 


%We assume that the under the causal conditions of model $\s{M}_k$, $H_0$ implies $\psi_{k,P} = 0$ for each $k \in \{1, \dotsc, K\}$. The goal is to a test of $H_0$ that has correct type $I$ error rate under the union model $\s{M} := \cup_{k=1}^K \s{M}_k$. In this model, $H_0$ implies that $\psi_k(P) = 0$ for at least one $k$. \tw{One-sided implication allows for valid tests of strong null with CATE or ATE. If reverse implication does not hold, i.e.\ there are distributions $P$ where $\psi_{k,P} = 0$ but $H_0$ is false, would have trivial power at such $P$.}


% \subsection{Asymptotically linear estimators}

For each $k$, we suppose we can construct an asymptotically linear estimator $\psi_{k,n}$ of $\psi_{k,P}$ with influence function $\phi_{k,P}$ under statistical conditions $\s{C}_k$, meaning
$\psi_{k,n} - \psi_{k,P} = \d{P}_n \phi_{k,P} + \fasterthan(n^{-1/2}),$
 where $\d{P}_n f = \frac{1}{n} \sum_{i=1}^n f(O_i)$.  Here, $\phi_{k,P}$ may depend on $P$, and is assumed to satisfy $\E(\phi_{k,P}) = 0$ and $\E(\phi_{k,P}^2) < \infty$. The statistical conditions $\s{C}_k$ typically include rates of convergence and complexity constraints for nuisance estimators such as outcome regression or propensity score estimators, as well as constraints on $P$ such as finite moments or  semiparametric or parametric modeling assumptions.

 \subsection{Joint Distribution of Asymptotically Linear Estimators}
 Asymptotic linearity implies the marginal convergence result $n^{1/2} (\psi_{k,n} - \psi_{k,P}) \to_d N(0, \sigma_{k,P}^2)$ for $\sigma_{k,P}^2 := \E(\phi_{k,P}^2)$, which can be used to construct asymptotically valid Wald-style confidence intervals for $\psi_{k,P}$. A natural estimator of the asymptotic variance $\sigma_{k,P}^2$ is given by $\sigma_{k,n}^2 := \d{P}_n \phi_{k,n}^2$, where $\phi_{k,n}$ is an estimator of the influence function $\phi_{k,P}$. This is known as the \emph{influence function-based variance estimator} \citep{van2000asymptotic}. However, asymptotic linearity is stronger than marginal convergence. In particular, by the multivariate central limit theorem, asymptotic linearity of any finite collection of estimators implies \emph{joint} convergence in distribution of the estimators. Denoting $\bs\psi_P := (\psi_{1,P}, \dotsc, \psi_{K,P})'$ and $\bs\psi_n := (\psi_{1,n}, \dotsc, \psi_{K,n})'$ as vectors of the true and estimated parameters, respectively, and $\bs\phi_P := (\phi_{1,P}, \dotsc, \phi_{K,P})'$ as the vector of influence functions, if all the statistical conditions $\s{C}_1, \dotsc, \s{C}_K$ hold, then 
 \[ \bs\psi_{n} - \bs\psi_{P} = \d{P}_n \bs\phi_P + \fasterthan(n^{-1/2}).\]
 This implies  $n^{1/2}(\bs\psi_{n} - \bs\psi_P) \to_d N_K(\bs{0}, \bs\Sigma_P)$, where $\bs \Sigma_P$ is defined as {
 \[\begin{pmatrix} \E(\phi_{1,P}^2) & \E( \phi_{1,P} \phi_{2,P}) & \cdots & \E (\phi_{1,P} \phi_{K,P} )\\
 \E (\phi_{2,P} \phi_{1,P} )& \E (\phi_{2,P}^2) & \cdots & \E (\phi_{2,P} \phi_{K,P}) \\
 \vdots & \vdots & \ddots & \vdots \\
 \E (\phi_{K,P} \phi_{1,P}) & \E (\phi_{K,P} \phi_{2,P} ) & \cdots & \E( \phi_{K,P}^2) \end{pmatrix}.\] }
 We can estimate $\bs\Sigma_P$ using the influence function-based covariance estimator $\bs\Sigma_n$ by estimating $\E( \phi_{j,P} \phi_{k,P})$ with $\d{P}_n  \phi_{j,n} \phi_{k,n}$. If $\bs\Sigma_n \to_P \bs\Sigma_P$ and $\bs\Sigma_P$ is invertible, it follows that $n^{1/2}\bs\Sigma_n^{-1/2}(\bs\psi_{n} - \bs\psi_P) \to_d N_K(\bs{0}, \bs{I}_K)$, where $I_K$ is the $K \times K$ identity matrix and $\bs\Sigma_n^{-1/2}$ is the inverse of the matrix square root of $\bs\Sigma_n$.

\subsection{Tests of the Implied Null Based on Joint Asymptotic Normality}\label{sec:theory}

 We propose using the joint convergence implied by asymptotic linearity to derive tests of the null hypothesis that $\psi_{k,P} = 0$ for at least one $k$. By the delta method we have
 \[\textstyle n^{1/2} \left( \prod_{k=1}^K \psi_{k,n} -   \prod_{k=1}^K \psi_{k,P} \right) \to_d N \left( 0, \bs\gamma_P' \bs\Sigma_P \bs\gamma_P\right),\]
 where $\bs\gamma_P := (\gamma_{1,P}, \dotsc, \gamma_{K,P})'$ for $\gamma_{k,P} := \prod_{j \neq k} \psi_{j,P}$. We recall that if at least one of the causal models is correctly specified, then the causal null hypothesis $H_0$ implies that $\prod_{k=1}^K \psi_{k,P}= 0$, which then implies that 
 \[  \textstyle T_n := n^{1/2}  \left(\bs\gamma_n' \Sigma_n \bs\gamma_n\right)^{-1/2}\prod_{k=1}^K \psi_{k,n} \to_d N(0,1).\]
 Therefore, a two-sided test of $H_0$ with asymptotically valid type I error rate is given by rejecting at level $\alpha$ if  $|T_n| > q_{1-\alpha/2}$, where $q_p$ denotes the $p$th quantile of a standard normal distribution. The following result formally demonstrates that this proposed test has asymptotically valid type I error rate.

\begin{theorem}\label{thm:test_size}
Suppose that for each $k \in \{1, \dotsc, K\}$, $\psi_{k,n}$ is an asymptotically linear estimator of $\psi_{k,P}$ with influence function $\phi_{k,P}$, $\prod_{k=1}^K \psi_{k,P} = 0$, and $\bs\Sigma_n \to_P \bs\Sigma_P$, where $\bs\gamma_P' \bs\Sigma_P \bs\gamma_P > 0$. Then $P \left(|T_n|  >q_{1-\alpha/2}\right) \longrightarrow \alpha.$
\end{theorem}

A proof of Theorem~\ref{thm:test_size} is given in the Appendix. Theorem~\ref{thm:test_size} is stated in terms of the statistical properties of the test, and we now elaborate on how this relates to our goal of developing tests with causal model and statistical robustness. Theorem~\ref{thm:test_size} implies that if at least one of the causal models $\s{M}_1, \dotsc, \s{M}_K$ is true and \emph{all} of the statistical conditions $\s{C}_1, \dotsc, \s{C}_K$ implying asymptotic linearity of the estimators $\psi_{1,n}, \dotsc, \psi_{K,n}$ are true, then the test that rejects the causal null hypothesis  $H_0: \beta = 0$ when $|T_n| > q_{1-\alpha/2}$ has asymptotic size $\alpha$. Hence, increasing $K$ relaxes the causal conditions at the expense of stronger statistical conditions. By using semiparametric estimators rather than estimators based on parametric models, we increase the statistical robustness in conditions $\s{C}_1, \dotsc, \s{C}_K$.

We now briefly comment on some conditions under which we may not get precise type I error control, and justify why these situations may not be considered problematic in practice. First, if more than one $\psi_{k,P}$ equals zero, then $\bs\gamma_P = 0$, which implies that $\bs\gamma_P' \bs\Sigma_P \bs\gamma_P = 0$. Hence, our method only yields precise type I error control when exactly one of $\psi_{1,P}, \dotsc, \psi_{K,P}$ equals zero. If two or more equal zero, then the rate of convergence of $\prod_{k=1}^K \psi_{k,n}$ is faster than $n^{-1/2}$, and so our test will be asymptotically conservative. This will be illustrated in simulations in Section~\ref{sec:numerical} discussed further  in Section~\ref{sec:conclusion}. Briefly, this might occur when the null hypothesis is true and the analyst has successfully specified two or more causal models correctly. In practice, however, we expect a scenario in which the analyst is able to specify more than one model correctly to be exceptionally rare---often our concern is if even a single model has been correctly specified.
Readers interested in learning more about developing tests in such scenarios may also refer to \citet{miles2021composite} for a test developed in a separate context that has better power in the special case of $K = 2$ and diagonal $\bs\Sigma_P$. To our knowledge, no such test yet exists in the general case. %We also note that any differentiable function $f : \d{R}^K \to \d{R}$ satisfying $f(\b{x}) = 0$ for any $\b{x}$ with at least one zero component must have derivative zero at the origin. Hence, the conservative behavior of the proposed test near the origin is not due to the use of the product function to combine the estimators.

 Second, we  note that $\bs\gamma_P' \bs\Sigma_P \bs\gamma_P$ may equal $0$ if $\E(\phi_{k,P}^2) = 0$. Hence, precise type I error control using our method also relies on the variances $\E(\phi_{k,P}^2)$ being positive when $\psi_{k,P} = 0$. In some cases, $\psi_{k,P} = 0$ implies that $\E( \phi_{k,P}^2) = 0$. If this happens, our test may again be asymptotically conservative. For example, suppose the null hypothesis $H_0$ is the strong causal null hypothesis that there is no causal effect of a binary treatment $A$ on an outcome $Y$ for any unit in the population. Under the backdoor model, $H_0$ implies that $\psi := \E\{[\mu(1,C) - \mu(0,C)]^2\}= 0$, where $\mu(1,c) - \mu(0,c)$ is the conditional average treatment effect. When $\psi = 0$, its efficient influence function  is 0 \citep{levy2021fundamental}. However, since the strong null hypothesis implies the weak null hypothesis that the ACE equals zero, the problem can be avoided in this case by testing the weak null instead. 

 Finally,  $\bs\gamma_P' \bs\Sigma_P \bs\gamma_P$ may equal $0$ if two or more of the influence functions are linearly dependent under the null hypothesis. Fortunately, this can be checked by the researcher prior to using the method.

 The next result provides conditions under which the power of the test goes to one under fixed alternatives.
 \begin{theorem}\label{thm:test_power}
 Suppose that for each $k \in \{1, \dotsc, K\}$, $\psi_{k,n} \to_P \psi_{k,P}$, where $\prod_{k=1}^K \psi_{k,P} \neq 0$, and $\bs\Sigma_n = \bounded(1)$. Then $P \left(|T_n|  >q_{1-\alpha/2}\right) \longrightarrow 1$.
 \end{theorem}

 A proof of Theorem~\ref{thm:test_power} is provided in the Appendix.  The conditions of Theorem~\ref{thm:test_power} are substantially weaker than those of Theorem~\ref{thm:test_size}. In particular, Theorem~\ref{thm:test_power} only requires consistency of the estimators, which for doubly robust estimators can hold as long as at least one nuisance estimator is consistent.

 We note that $\prod_{k=1}^K \psi_{k,P} \neq 0$ requires that each $\psi_{k,P} \neq 0$. If $\s{M}_k$ is a correct causal model, then $\psi_{k,P} \neq 0$ if and only if $\beta \neq 0$. However, if $\s{M}_k$ is invalid, then $\psi_{k,P}$ does not necessarily have any correspondence with $\beta$, and hence $\psi_{k,P}$ may equal 0 even if $\beta \neq 0$. Hence, the power of the proposed test may not converge to one under certain alternatives even if at least one of $\s{M}_1, \dotsc, \s{M}_K$ is true and all of the statistical conditions $\s{C}_1, \dotsc, \s{C}_K$ are true. This phenomenon will be illustrated in numerical studies in Section~\ref{sec:numerical}. %Avoiding the assumption that $\beta \neq 0$ implies $\psi_{k,P} \neq 0$ would improve the power of the test in some cases, but
 It appears that developing a consistent test in situations where $\beta \neq 0$ but some $\psi_{k,P} = 0$ would require being able to determine which models are invalid, which as discussed above is typically not possible. However, in some cases, even when $\s{M}_k$ is invalid, $\psi_{k,P} = 0$ is an ``unlikely" event when $\beta \neq 0$ in the sense that it requires exact cancellations of certain causal effects. This is related to the  \emph{faithfulness} assumption in DAGs \citep{spirtes2000causation}, which states that (conditional) independence between variables under $P$ can always be attributed to the structure of the causal graph. In causal graphical selection,  $P$ is often assumed to be faithful with respect to a causal graph with the justification that unfaithful distributions are rare \citep{spirtes2000causation}.  %\textcolor{red}{Also, can we be any more precise about the relationship between these two ideas? e.g., ``If the true causal DAG is faithful and XYZ, then $\beta \neq 0$ implies $\psi_{k,P} \neq 0$." XYZ could be conditions about how $\psi_{k,P}$ relates to independence between $Y$ and $A$ or something.} 
If 1) the distribution $P$ is faithful and 2) $\psi_{k, P}=0$ if and only if $Y\ci A \mid R$, where $R$ denotes other observed variables appearing in $\psi_{k, P}$, then $\beta \not=0$ implies that $\psi_{k, P}\not=0$. Condition 2) holds in, for example, some linear Gaussian models. An example of a causal model violating faithfulness is shown in the Appendix.


\def\arraystretch{1.4}
\begin{figure*}[t]
  \centering
 \includegraphics[width=0.49\linewidth]{./images/backdoor_frontdoor_iv_null.png}
  \includegraphics[width=0.49\linewidth]{./images/backdoor_frontdoor_iv_alternative.png}
  \caption{Size (left) and power (right) of the test as a function of sample size when at least one of backdoor, front-door, or IV are true. Panel labels indicate which model(s) are correct (TRUE) and incorrect (FALSE).}
  \label{fig: backdoor-frontdoor-iv}
\end{figure*}

\section{PRACTICAL APPLICATIONS OF THE GENERAL METHOD}\label{sec:numerical}

As noted in Section~\ref{sec:method}, our method can be applied to any set of causal models as long as we can construct asymptotically linear estimators of each $\psi_{k, P}$. Recent developments in semiparametric theory allow us to do this for any identified query of the ACE given a causal graph with unmeasured confounders \citep{bhattacharya2022semiparametric, jung2021estimating}.

We highlight two important examples here: (i) three qualitatively distinct causal models---backdoor, front-door, and IV, and (ii) multiple plausible backdoor models. We assess the performance of our proposed test using numerical studies in both examples. In Section~\ref{sec:data} we demonstrate an application of (i) to the Framingham Heart Study, as highlighted in our motivating example. We also compare our method with prior evidence factors work using the Wisconsin Longitudinal Study using two distinct IV models and a backdoor model. % and compare it to the evidence factors analysis performed in \cite{karmakar2021reinforced} to study the effect of enrollment in a Catholic school on future income.


\subsection{Backdoor, Front-door, and IV models}


We return to the three candidate causal models introduced in Section~\ref{sec:motivating} and displayed in Figure~\ref{fig:back-front-door}: the backdoor, front-door, and IV models.  Before describing the numerical study, we provide additional details about the causal models and estimators. We are interested in testing the weak causal null hypothesis $H_0 : \beta = \E[Y(A = 1) - Y(A = 0)] = 0$. Causal model $\s{M}_1$ is the backdoor model. In addition to SUTVA and consistency, the assumptions of $\s{M}_1$ are:  (i) $Y(A = a) \ci A \mid C$ for $a \in \{0,1\}$ (conditional ignorability), and (ii) $0 < \pi(C) < 1$ almost surely for  $a \in \{0,1\}$ for $\pi(c) := P(A = 1 \mid C=c)$ (positivity). Under these conditions, $\beta = \psi_{1,P}$ defined in~\eqref{eq:bdoor}. The nonparametric efficient influence function of $\psi_{1,P}$ is $\phi_{1,P} = \phi_{1,P}^\circ - \psi_{1,P}$, where $\phi_{1,P}^\circ(y,a,c)$ is given by
\begin{align*}
\left\{ y - \mu(a,c)\right\}\left\{ \frac{a - \pi(c)}{\pi(c)[1-\pi(c)]} \right\} +\left\{\mu(1,c) - \mu(0,c)\right\}.
\end{align*}
The AIPW estimator \citep{bang2005doubly} is an asymptotically linear estimator of $\psi_{1,P}$ with influence function $\phi_{1,P}$ under doubly robust conditions on estimators $\mu_n$ and $\pi_n$ of $\mu$ and $\pi$, respectively.

Causal model $\s{M}_2$ is the front-door model. The key assumptions of $\s{M}_2$ are: (i) $Y(A=a, M=m)= Y(M=m)$ for $a, m \in \{0,1\}$ (no direct effect of treatment on the outcome); (ii) $Y(M = m) \ci M(A = a) \mid C$ for $a, m \in \{0,1\}$ (conditional ignorability of the mediator-outcome relationship); (iii) $M(A=a) \ci A \mid C$ for $a \in \{0,1\}$ (conditional ignorability of the treatment-mediator relationship); (iv) $Y(M = m) \ci M \mid A, C$; and (v) $0 < P(A = a, M = m \mid C) < 1$ almost surely for each $a, m \in\{0,1\}$  (positivity). Unobserved confounding between $A$ and $Y$ is permitted. Under these conditions, $\beta = \psi_{2,P}$ defined in~\eqref{eq:fdoor}. The nonparametric efficient influence function $\phi_{2,P}(y,m,a,c)$ of $\psi_{2,P}$ is
\begin{align*}
&\frac{\alpha(m \mid 1, c) - \alpha(m \mid 0, c)}{\alpha(m \mid a, c)} \left\{y - \mu(m,a,c)\right\} \\
&\qquad + \left\{\frac{a - \pi(c)}{\pi(c)[1-\pi(c)]} \right\} \left\{ \gamma(m,c) - \tau(a,c)\right\} \\
&\qquad + \left\{ \eta(1, a, c) - \eta(0,a,c) \right\} - \psi_{2,P}, 
\end{align*}
where $\alpha(m \mid a, c) := P(M = m \mid A=a ,C =c)$,  $\eta(a_0, a, c) :=  \E[\mu(M, a,c) \mid A = a_0, C =c]$, and $\tau(a,c) := \E[\eta(a, A, c) \mid C = c ]$.  The augmented primal IPW estimator of $\psi_{2,P}$ \citep{fulcher2020robust, bhattacharya2022semiparametric} is asymptotically linear with influence function $\phi_{2,P}$ under double robust conditions on estimators of  the sets $\{\pi, \mu\}$ and $\{\alpha\}$.
% \begin{align*}
% \eta(a_0, a, c) &:=  E[\mu(M, a,c) \mid A = a_0, C =c] \\% = \\ \int \mu(m,a,c) \, dP(m \mid a_0, c)\\
% \tau(a,c) &:= 
% E[\eta(a, A, c) \mid C = c ] = \\ \iint \mu(m,\bar{a},c) \, dP(m \mid a, c) \, dP(\bar{a} \mid c).
% \end{align*}

Finally, causal model $\s{M}_3$ is an IV model. The key assumptions of $\s{M}_3$ are: (i) $Y(Z = z) \ci Z$, for $z \in \{0,1\}$ (randomized instrument); (ii)  $Y(Z =z, A = a) = Y(A = a)$ for each $a,z \in \{0,1\}$ % $Y(Z =1, A(Z=1)=0) = Y(Z=0, A(Z=0)=0)$ and $Y(Z=1, A(Z=1)=1) = Y(Z=0, A(Z=0)=1)$ 
(no direct effect of the instrument on the outcome); (iii) $P(A(Z=0)=1, A(Z=1)=0)=0$ (monotonicity); (iv) $\E[A(Z=1) - A(Z=0)] \neq 0$ (non-null effect of the instrument on treatment); (v) $\mathrm{Var}\{Y(A = 1) - Y(A=0)\} = 0$ (homogeneity); and (vi) $0 < P(Z = 1) < 0$ (positivity). Unobserved confounding of the treatment-outcome relationship is again permitted. 
Under these conditions, $\beta = \psi_{3,P}$ defined in~\eqref{eq:iv}. We note that without the homogeneity assumption, $\psi_{3,P}$ is identified with the ACE among compliers, so we use it here to identify our actual target $\beta$. The nonparametric efficient influence function $\phi_{3,P}(y,a,z)$ of $\psi_{3,P}$ is
\begin{align*}
 &\left[\left\{y - \mu(z)\right\} \left\{\pi(1) - \pi(0)\right\} - \left\{ a - \pi(z)\right\} \left\{ \mu(1) - \mu(0)\right\} \right] \\
    &\qquad \quad \times \frac{z / \zeta - (1-z)/(1-\zeta)}{\{\pi(1) - \pi_0(0)\}^2},
\end{align*}
where $\mu(z) := \E(Y \mid Z = z)$, $\pi(z) := P(A = 1 \mid Z = z)$, and $\zeta := P(Z = 1)$. Since $A$ and $Z$ are binary, an asymptotically linear estimator of $\psi_{3,P}$ with influence function $\phi_{3,P}$ can be constructed by replacing the conditional expectations in the definition of $\psi_{3,P}$ given in~\eqref{eq:iv} with empirical conditional expectations.

%The simulations have three goals: (i)  Demonstrate that our proposed test has asymptotically valid size under different types of null hypotheses, as long as at least one of the causal models is correct. (ii) Compare the size of the test when just one $\psi_k = 0$ versus when multiple $\psi_k = 0$. We expect the size will be smaller than nominal when all $\psi_k = 0$. (iii) Assess the power of the test at alternatives. We expect the test to have power going to 1 as long as all $\psi_k \neq 0$, which corresponds to a true alternative as long as at least one of the causal models is correct. Note that we don't expect to have power going to 1 if any $\psi_k = 0$, which could still correspond to a true alternative if causal model $k$ is incorrect. 

We note that it is possible that $\s{M}_1$, $\s{M}_2$, and $\s{M}_3$ are invalidated by a common source of bias. For example, if $Z$ has a direct effect on $Y$, this invalidates both the IV model $\s{M}_3$ and the backdoor model $\s{M}_1$ (if $Z$ is not in the adjustment set $C$). Unlike previous evidence factors analyses \citep{karmakar2021reinforced, zhao2022evidence}, we do not alter the adjustment sets nor impose any restrictions on the order of analyses to prevent the source of bias of one model from invalidating others. 

In the first numerical study, we consider testing the causal null hypothesis $H_0 : \beta = 0$ against the two-sided alternative using our proposed test with $K = 3$ using the three causal models $\s{M}_1$, $\s{M}_2$, and $\s{M}_3$ defined above. We consider settings where the assumptions of all causal models hold, where the assumptions of two of the models hold, and where the assumptions of just one model holds. For each setting, we  consider data-generating distributions where the identified functional in the incorrect models is 0 or is different from 0 because we expect this to impact the rejection rate of the test, as discussed in Section~\ref{sec:theory}. To violate the assumptions of $\s{M}_1$, we either include unmeasured confounders or  adjust for colliders. To violate the assumptions of $\s{M}_2$, we either include an effect of $A$ on $Y$ not mediated through $M$ or include unmeasured confounding between $A$ and $M$ or between $M$ and $Y$. To violate the assumptions of $\s{M}_3$, we include a direct effect of $Z$ on $Y$, include unmeasured confounding between $Z$, $A$, and $Y$, or violate monotonicity. To simultaneously violate the assumptions of $\s{M}_1$ and $\s{M}_2$, we use a common source of bias: a direct effect of $Z$ on $Y$. The full details of the data-generating processes for each setting are in the Appendix.

For each data-generating distribution, we simulate data under the null and alternative hypotheses for sample sizes $n\in \{100, 250, 500, 750, 1000\}$. For each simulated dataset, we use  our proposed test with the estimators and influence functions described above. We estimate outcome regression and propensity score functions using generalized additive models. For each setting and sample size, we conduct 1000 simulations and record the fraction of the time that our test rejected the null hypothesis at level $\alpha = 0.05$. 

Figure~\ref{fig: backdoor-frontdoor-iv} displays the size and power of the test as a function of sample size under the different settings. The results are consistent with our  expectations based on the theory of Section~\ref{sec:theory}. Under the null (left panel of Figure~\ref{fig: backdoor-frontdoor-iv}) the size of the test converges to $\alpha = 0.05$ when two of the causal models are wrong and both identified functionals in the wrong models are not zero. The size is close to zero when more than one of the causal models are correct or when the identified functional in the wrong model is zero. This is because, as discussed in Section~\ref{sec:theory}, our test is conservative when more than one $\psi_{k,P}$ equals  zero. Under the alternative (right panel of Figure~\ref{fig: backdoor-frontdoor-iv}), the power of the test converges rapidly to 1 in all cases when the identified functionals in the wrong model are not zero. The test has low power when identified functional in the wrong model equals zero as discussed in Section~\ref{sec:theory}.

We also consider our proposed test with $K = 2$ using all three pairs of models: $\s{M}_1$ and $\s{M}_2$, $\s{M}_1$ and $\s{M}_3$, and $\s{M}_2$ and $\s{M}_3$. The simulation results for these settings can be found in the Appendix, and again align with our theoretical expectations.

% \begin{figure*}[ht]
%   \centering
%   \subfigure[]{\includegraphics[scale=0.5]{./images/backdoor_frontdoor_iv_null.png}}\quad
%   \subfigure[]{\includegraphics[scale=0.5]{./images/backdoor_frontdoor_iv_alternative.png}}
%   \caption{(a) The sizes of the tests when at least one of the three casual models hold. (b) The powers of the tests when at least one of the three casual models hold.}
%   \label{fig: backdoor-frontdoor-iv}
% \end{figure*}


\begin{figure*}[t!]
  \centering
   \includegraphics[width=0.49\linewidth]{./images/backdoor_adjustment_null.png}
  \includegraphics[width=0.49\linewidth]{./images/backdoor_adjustment_alternative.png}
  \caption{Size (left) and power (right) of the test as a function of sample size for the second numerical study.}
  \label{fig:backdoor_adjust}
\end{figure*}

\subsection{Multiple Backdoor Models}

In the second example, we consider $K = 3$ backdoor models with different adjustment sets. Figure~\ref{fig:back-adjustments} displays the true causal graph. The adjustment set of the first backdoor model is $\{C_1, C_2, C_3, C_4\}$. This model is correct because this set satisfies the backdoor criterion with respect to $A$ and $Y$. The second adjustment set is $\{C_1, C_3\}$, and the third adjustment set is $\{C_1, C_4\}$, so both of these adjustment sets are invalid because they omit the confounder $C_2$. As long as the common source of bias shared by multiple analyses does not affect all candidate models, then our approach can still be valid, which is again one of the stated advantages of our method over standard evidence factors designs.


\begin{figure}[h]
		\centering
			\scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
					     
					% \begin{scope}[]
     %                        \path[->, very thick]
					% 	node[] (a) {$A$}
					% 	node[right of=a] (y) {$Y$}
     %                        node[left of=a] (c4) {$C_4$} 
     %                        node[above right of=a, xshift=-0.5cm] (c) {$C_2, C_3$}
     %                        node[right of=y](c1) {$C_1$}
                            
						
     %                    (a) edge[blue] (y)
     %                    (c) edge[blue] (a)
     %                    (c) edge[blue] (y)
     %                    (c4) edge[blue] (a)
     %                    (c4) edge[blue, bend left] (y)
     %                  %  (u) edge[blue] (y)
     %                    (a) edge[blue, bend left] (c1)
     %                    (y) edge[blue] (c1)
					% 	;
						
					% \end{scope}

     	          \begin{scope}[]
                            \path[->, very thick]
						node[] (a) {$A$}
						node[right of=a] (y) {$Y$}
                            node[left of=a] (c1) {$C_1$}
                            node[above of=a] (c2) {$C_2$} 
                            node[above of=y] (c3) {$C_3$}
                            node[above of=c1] (c4) {$C_4$}
                            
						
                        (a) edge[blue] (y)
                        (c1) edge[blue] (a)
                        (c2) edge[blue] (a)
                        (c2) edge[blue] (y)
                        (c3) edge[blue] (y)
						;
						
					\end{scope}
				
				\end{tikzpicture}
			}
		\caption{True causal DAG for the backdoor models.}
		\label{fig:back-adjustments}
\end{figure}


We use our proposed test with three AIPW estimators with the three different adjustment sets. We use generalized additive models to estimate the outcome regression and propensity score. Figure~\ref{fig:backdoor_adjust} displays the results of the second numerical study. The results are consistent with our expectations.  Under the null, the size of the test converges to $\alpha = 0.05$, but is slightly anti-conservative for $n = 250$. Under the alternative, the power of the test is close to 1 for all $n$  because in this case, the identified functionals in the backdoor models with invalid adjustment sets are not  zero.

\section{REAL DATA APPLICATIONS}\label{sec:data}

In this section we evaluate our methods using two real-world studies. The first is the Framingham Heart Study as introduced in our example in Section~\ref{sec:motivating}. The second is the Wisconsin Longitudinal Study that has been analyzed using classical evidence factors methods by \cite{karmakar2021reinforced} and thus allows us to compare our methods with prior work.

\subsection{Framingham Heart Study}


We first use our methods to test the effect of smoking on glucose levels using the Framingham Heart Study \citep{kannel1968framingham}. We use the backdoor, front-door, and IV models defined in Sections~\ref{sec:motivating} and~\ref{sec:numerical} as our candidate models.
Our treatment $A$ is a binary indicator of current smoking status and our outcome $Y$ is a continuous measure of blood glucose level. We adjust for baseline covariates $C$ containing age, sex, BMI, past history of heart disease, and past glucose level in the backdoor model.  We propose hypertension as a candidate mediator $M$  for the front-door model, and past history of hypertension as a candidate instrumental variable $Z$ for the IV model. We estimate the ACE in each candidate model using the methods described in Section~\ref{sec:numerical}.

Table~\ref{table:framingham} displays the estimates, 95\% confidence intervals, and p-values from the tests of the null hypothesis of zero ACE using each causal model individually. The tests based on the backdoor and front-door models fail to reject the null hypothesis that smoking status has no effect on glucose levels. The test based on the IV model rejects the null hypothesis at significance level $0.05$ and produces an estimated ACE less than zero, suggesting that smoking reduces glucose levels. However, these results all rely on validity of the single causal model on which they are based. The joint test proposed here is valid if any of the three causal models is valid and returns a p-value of $0.68$. Hence, we do not find evidence of a statistically significant causal effect of smoking on glucose levels.


\begin{table}[hbt!]
\centering
\begin{tabular}{ |c|c|c|c| } 
\hline
 \textbf{Method} & ${\bf \widehat{ACE}}$ \textbf{(95\% CI)} & \textbf{p-value} \\  
 \hline
 Backdoor & 0.32 (-1.2,  1.8) & 0.67 \\ 
 \hline
 Front-door & -0.038 (-0.090,  0.014) & 0.15 \\
 \hline
 IV & -47.7 (-62.8, -32.6) & $ 6.5\times 10^{-10}$ \\
 \hline
\end{tabular}
\caption{Results from the analysis of the effect of smoking on glucose from the Framingham heart study.}
\label{table:framingham}
\end{table}

\subsection{Wisconsin Longitudinal Study}

We next evaluate our method with the Wisconsin Longitudinal Study (WLS) dataset from the \texttt{R} package \texttt{blockingChallenge} \citep{blockingChallenge}. We compare our methods and results to evidence factors analysis for this data \citep{karmakar2021reinforced}. 

The WLS data contains a sample of 4450 male students who completed high school in Wisconsin in 1957. The binary exposure of interest is whether the student attended a Catholic high school, and the outcome is income in 1974. \cite{karmakar2021reinforced} considered three causal models: (1) an IV model using whether the student's family resided in an urban or rural area during high school as an instrument; (2) an IV model using whether the student's family was Catholic as an instrument and urban/rural residence as a covariate; and (3) a backdoor model adjusting for both urban/rural residence and Catholic religion as covariates. Each model also included IQ score prior to high school, father’s and mother’s education, parents' income, father’s occupation score, and occupational prestige score as covariates. Letting $\beta$ be the ACE of attending a Catholic school on income, we use the methods of \cite{karmakar2021reinforced} to test the null hypothesis that $\beta = 0$ versus the alternative that $\beta \neq 0$ in these three models, and combine these three causal models using evidence factors methodology. We assume that at least one model is correct, so we combine the individual p-values from the evidence factors analysis by taking the maximum of the three. 

We apply our proposed test with the three causal models described above with slight modifications using the methods described in Section~\ref{sec:numerical}. For the two IV models, we do not adjust for any covariates, and for the backdoor model, we adjust for all covariates excluding the two candidate IVs. 

% We refer readers to \cite{karmakar2021reinforced} and the documentation of \texttt{blockingChallenge} for additional details of the data and methods.

\begin{table}[hbt!]
\centering
\begin{tabular}{ |c c c c| } 
\hline
 \textbf{Urban IV} & \textbf{Catholic IV} & \textbf{Backdoor} & \textbf{Combined}\\  
 \hline
 \multicolumn{4}{|c|}{\textbf{Evidence Factors Analysis}}\\
\hline
 $<$ 0.0001 & 0.0084 & 0.0098 & 0.0098 \\ 
 \hline
 \multicolumn{4}{|c|}{\textbf{Asymptotic Joint Test}}\\
 \hline
 $3.3 \times 10^{-14}$ & 0.0094 & 0.0004 & 0.0950 \\
 \hline
\end{tabular}
\caption{Results comparing of our method to evidence factors analysis in analyzing the effect of Catholic schooling on wages from the Wisconsin longitudinal study.}
\label{table:WLS}
\end{table}

% \begin{table}[hbt!]
% \centering
% \begin{tabular}{ |c c c c| } 
% \hline
% \multicolumn{4}{|c|}{\textbf{Evidence Factors Analysis}}\\
% \hline
%  \textbf{Urban IV} & \textbf{Catholic IV} & \textbf{Backdoor} & \textbf{Combined}\\  
%  \hline
%  \multicolumn{4}{|c|}{${H_0: \beta \leq 0}$}\\
% \hline
%  0.0000 & 0.0065 & 0.0149 & 0.0149 \\ 
%  \hline
%  \multicolumn{4}{|c|}{${H_0: \beta \leq 500}$}\\
%  \hline
%  0.0000 & 0.0394 & 0.2013 & 0.2013 \\
%  \hline
%  \multicolumn{4}{|c|}{\textbf{Asymptotic Joint Test}}\\
%  \hline
%  \textbf{Urban IV} & \textbf{Catholic IV} & \textbf{Backdoor} & \textbf{Combined}\\ 
%  \hline
%  \multicolumn{4}{|c|}{${H_0: \beta \leq 0}$}\\
%  \hline
%  0.0002 & $1.6 \times 10^{-14}$ & 0.0047 & 0.047 \\
%  \hline
%  \multicolumn{4}{|c|}{${H_0: \beta \leq 500}$}\\
%  \hline
%  0.0044 & $1.0 \times 10^{-13}$ & 0.017 & 0.092\\
%  \hline
% \end{tabular}
% \caption{Results comparison of our method and evidence factor analysis in analyzing the effect of Catholic schooling on wages from the Wisconsin longitudinal study.}
% \label{table:WLS}
% \end{table}



 %We estimate the ACE in the backdoor model . The two IV models include adjustment covariates, so the IV methods discussed in Section~\ref{sec:numerical} do not directly apply. To estimate the ACE in the IV models with covariates, we take the ratio of AIPW estimators of the numerator and denominator of the IV estimand, and combine the influence functions (provided in Section~\ref{sec:numerical}) using the delta method. 

Table~\ref{table:WLS} displays the p-values from the three individual tests and the combined test using the evidence factors methodology and our methodology. While all individual p-values are statistically significant at the $0.01$ level, our combined p-value is not. This is because the three individual p-values using our proposed models are positively correlated, while the p-values using the evidence factors methods are nearly independent of each other under the null by carefully constructing each evidence factor analysis. In particular, the estimated correlation between the AIPW estimator and the IV estimators from the Catholic religion and urban/rural IV models are 0.42 and 0.22, respectively. Therefore, whereas the combined p-value from the evidence factors analysis simply takes the maximum among the three p-values, our method takes into account the correlations among the three tests. Our method produces a valid test even if the individual p-values are positively correlated and does not require particular causal models to make p-values from each analysis nearly independent under the null.


\section{CONCLUSION}
\label{sec:conclusion}
Many of the assumptions of causal models in the context of observational data are strong and empirically untestable. It is desirable to use methods that are as robust as possible in such settings in order to relax the strength of the assumptions. In this paper, we proposed a method of testing a causal null hypothesis in the presence of several candidate causal models that provides both statistical and causal robustness. Our test is valid if at least one of the proposed causal models is correct, without knowing which one is correct. Furthermore, our test is based on semiparametric estimators, which possess desirable statistical robustness properties. Our methods also relax standard evidence factors conditions in two ways: we remove the requirement that non-overlapping biases invalidate the causal models, and we do not need to show the distribution of the p-values from each factor dominates the uniform distribution under the null. This has allowed us to apply our method to new settings for which evidence factors have not yet been developed. We expect there are applications of our work to additional new settings, as well as extensions to causal sensitivity analysis.

The relaxation of the second condition comes at the cost of statistical power when more than one causal model is correct. Some evidence factor analyses allow researchers to assume $J \geq 2$ of the $K$ causal models are correct, without knowing which $J$ causal models are correct \citep{rosenbaum2010evidence, rosenbaum2011approximate}. %When each element of the $p$-values derived from these models is stochastically larger than the uniform under the null, then these $k$ $p$-values can be combined, for example, using Fisher's method \citep{fisher1936statistical}. 
The resulting combined test is more powerful as $J$ increases, at the expense of stronger conditions and less robustness to invalid causal models. In particular, if the practitioner assumes that $J > 1$ models are correct, when in truth fewer than $J$ are correct, then the resulting test has invalid type I error rate.  Here, we only considered the situation where $J=1$, and if the number of true causal models exceeds one, our test is valid but tends to be conservative. Extending our approach to settings where $J \geq 2$ models are correct is an important area of future research. 

Our theory covers the case where the number of causal models $K$ is fixed, and we were primarily focused on the situation where $K$ is relatively small. Another interesting area of future research is to quantify the trade-offs in robustness and power as a function of $K$ and the dependence between the estimators in each model. We expect that increasing $K$ typically comes with a reduction in power. However, we also believe that qualitatively distinct causal models, such as the backdoor, front-door, and IV models considered here, leads to less power reduction than qualitatively similar models, such as multiple backdoor models, because the power of the combined test is lower when the individual p-values are positively correlated.

Finally, we focused here on testing causal null hypotheses because testing is the main focus of the evidence factors literature, and is an important aspect of causal inference across various disciplines such as epidemiology \citep{swanson2018causal}, political science \citep{eggers2023placebo}, and economics \cite{angrist2011causal}. However, as with evidence factors, we expect that our tests can be inverted to construct robust confidence sets. This too is an important topic of future research.






% \begin{contributions} % will be removed in pdf for initial submission 
% 					  % (without ‘accepted’ option in \documentclass)
%                       % so you can already fill it to test with the
%                       % ‘accepted’ class option
%     % Briefly list author contributions. 
%     % This is a nice way of making clear who did what and to give proper credit.
%     % This section is optional.

% \end{contributions}

\begin{acknowledgements} 
The authors gratefully acknowledge support from NSF grant 2113171 (TW) and the helpful feedback of four anonymous reviewers.
\end{acknowledgements}

% \clearpage
% References
\bibliography{uai2024-template}

\newpage

\onecolumn

\title{Statistical and Causal Robustness for Causal Null Hypothesis Tests\\(Supplementary Material)}
\maketitle

\appendix
\section{PROOF OF THEOREMS}\label{app:proofs}

\begin{proof}[{\bfseries Proof of Theorem~\ref{thm:test_size}}]
Asymptotic linearity of $\bs\psi_{n}$  implies that $n^{1/2}(\bs\psi_{n} - \bs\psi_P) \to_d N(\bs{0}, \bs\Sigma_P)$, where $\bs\Sigma_P := \E[\bs\phi_P\bs\phi_P']$ is the asymptotic covariance matrix. Let $h : \d{R}^K \to \d{R}$ be defined pointwise as $h(x_1, x_2, \dotsc, x_K) := \prod_{k=1}^K x_k$. Then $h$ is a continuously differentiable function with $\frac{\partial h}{\partial x_k}(x_1, x_2, \dotsc, x_K) = \prod_{j\neq k } x_j$ for each $k$. Denoting the gradient mapping of $h$ by $\nabla h$, by the delta method,
\begin{align*}
 n^{1/2} \left[h(\bs\psi_{n}) - h(\bs\psi_P)\right] &= \textstyle n^{1/2}\left(\prod_{k=1}^K \psi_{k,n} - \prod_{k=1}^K \psi_{k,P}\right) \to_d N(0, \sigma_P^2)
 \end{align*}
 for
 \[
 \sigma_P^2 := \nabla h(\bs\psi_P)'\bs\Sigma_P \nabla h(\bs\psi_P) = \bs\gamma_P'\bs\Sigma_P\bs\gamma_P.\]
Since $\bs\Sigma_n \to_P \bs\Sigma_P$ by assumption, by the continuous mapping theorem \citep{mann1943stochastic}, $\left(\bs\gamma_n' \bs\Sigma_n \bs\gamma_n\right)^{1/2} \to_P \left(\bs\gamma_P' \bs\Sigma_P \bs\gamma_P\right)^{1/2} = \sigma_P$, which is positive by assumption. Therefore, since $\prod_{k=1}^K \psi_{k,P} = 0$,
 \[ \textstyle n^{1/2}\left(\bs\gamma_n' \bs\Sigma_n \bs\gamma_n\right)^{-1/2} \prod_{k=1}^K \psi_{k,n} \to_d N(0,1).\]
 The result follows.
\end{proof}

\begin{proof}[{\bfseries Proof of Theorem~\ref{thm:test_power}}]
By the continuous mapping theorem \citep{mann1943stochastic}, $\prod_{k=1}^K \psi_{k,n} \to_P \prod_{k=1}^K \psi_{k,P} \neq 0$, and $\bs\gamma_n \to_P \bs\gamma_P$. Since $\bs\Sigma_n = \bounded(1)$, $(\bs\gamma_n' \bs\Sigma_n \bs\gamma_n)^{1/2} = \bounded(1)$ as well.  Therefore, $|T_n| \to_P +\infty$, which yields the result.
\end{proof}

% \clearpage

\section{EXAMPLE OF A CAUSAL MODEL VIOLATING FAITHFULNESS}

Figure~\ref{fig:faithful} shows an example of a causal model that violates faithfulness due to exact cancellation and where $\beta \neq 0$ but $\psi_{k,P} = 0$ when applying the backdoor formula with observed covariates. In this example, each variable is equal to a linear function of its direct causes and an independent noise term; e.g., $Y = 2A -2U + 4C + \epsilon_Y$.  Here, the causal null $H_0$ that $A$ has no causal effect on $Y$ is false -- the causal effect of $A$ on $Y$ is the coefficient $2$. This distribution violates faithfulness because $A$ and $Y$ are not d-separated given $C$ \citep{pearl2009causality}, but nevertheless it turns out that  $Y\ci A \mid C$. To see this, we use Wright's rules of path analysis (assuming all variables are standardized) \citep{wright1921correlation} to find that $\text{Cor}(A, Y \mid  C) = -2\times 1 + 2=0$. Since $Y$ is given by a linear combination of its causes, this implies $Y\ci A \mid C$. Since the conditional independence $Y\ci A \mid C$ does not correspond to a property of the graph, it violates faithfulness. Furthermore, while the backdoor model with conditioning set $C$ is false due to the unblocked backdoor path through $U$, the observed data parameter identified by the backdoor model is given by $\psi_{1,P} = \text{Cor}(A, Y \mid  C) = 0$ as above. Hence,  $\psi_{1,P} = 0$ even though $\beta \neq 0$, which is due to the violation of faithfulness.

 
\begin{figure}[h]
\centering
  \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
					
                    \begin{scope}
						\path[->, very thick]
						node[] (a) {$A$} 
						node[right of=a] (y) {$Y$}
						node[above left of=a, xshift=1cm] (c) {$C$}
                            node[above right of=y, xshift=-1cm] (u) {$U$}
					%	node[below of=a, yshift=0.8cm, xshift=1cm] (label)
                        node[below of = a, xshift = 1cm, yshift=1.6cm] (ay) {$2$}
                        node[right of=y, xshift=-1.5cm, yshift=0.5cm] (uy) {$-2$}
                        node[left of=u, xshift=0.9cm, yshift=-0.35cm] (ua) {$1$}
                        node[left of=a, xshift=+1.5cm, yshift=0.5cm] (ca) {$3$}
                        node[right of=c, xshift=-0.9cm, yshift=-0.35cm] (cy) {$4$}
						
						(c) edge[blue] (a)
						(a) edge[blue] (y)
						(c) edge[blue] (y)
                            (u) edge[red] (y)
                            (u) edge[red] (a)
						;
					\end{scope}
				\end{tikzpicture}
    }
		\caption{A causal model violating faithfulness. }
		\label{fig:faithful}
\end{figure}

\section{ADDITIONAL DETAILS FOR SIMULATIONS STUDIES}
\label{sec:appendix_dgp}

Here we provide details for the data-generating processes for the simulation studies presented in Section~\ref{sec:numerical}. The coefficient $\beta$ was set to 0 under the null and set to 10 under alternatives. We define $\n{expit}(x) := 1 / [1 + \exp(-x)]$ for $x \in \d{R}$. Throughout, ``$\n{Bern}(p)$" is shorthand for the Bernoulli distribution with probability $p$, ``$\n{Unif}(a,b)$" is shorthand for the continuous uniform distribution on the interval $[a,b]$, and $N(\mu, \sigma^2)$ is shorthand for the normal distribution with mean $\mu$ and variance $\sigma^2$.

\subsection{Backdoor, Front-door, and IV Models}

\begin{figure*}[hbt!]
   \centering
    \begin{subfigure}[b]{0.32\linewidth}
    \centering
        \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
					
					\begin{scope}[]   
						\path[->, very thick]
                            node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=z, yshift=.9cm] ()
                   %         node[below of=m, yshift=0.8cm](label) {$(a)$}
                   
					  (z) edge[blue] (a)	
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
						;
					\end{scope}
                \end{tikzpicture}
            }
            \caption{}
            \label{subf:allcorrect}
    \end{subfigure}         
    \begin{subfigure}[b]{0.32\linewidth}
    \centering
            \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
                    \begin{scope}[xshift=7.25cm]
                    \path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=z, yshift=.9cm] ()
                    %        node[below of=m, yshift=0.8cm](label) {$(b)$}

                        (z) edge[blue] (a)
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
					  (u) edge[black] (a)
						;
						
					\end{scope} 
                \end{tikzpicture}
            }
        \caption{}
         \label{subf:bdoorwrong}
         \end{subfigure} 
    \begin{subfigure}[b]{0.32\linewidth}
     \centering
                    \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
                    \begin{scope}[]
					   \path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=z, yshift=.9cm] ()
                     %       node[below of=m, yshift=0.8cm](label) {$(d)$}
                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
					  (u) edge[black] (m)	
						;	
					\end{scope}
                \end{tikzpicture}
     }
     \caption{}
     \label{subf:ivwrong-exclusion}
    \end{subfigure}  
     
       \quad
       
    \begin{subfigure}[b]{0.32\linewidth}
     \centering
            \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
                    \begin{scope}[]
                    \path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[above right of=a, xshift=-1.4cm] (u1) {$V$}
                            node[below of=z, yshift=.9cm] ()
                   %         node[below of=m, yshift=0.8cm](label) {$(g)$}

                        (z) edge[blue] (a)
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
					  (u) edge[black] (z)
                        (u1) edge[black] (y)
					  (u1) edge[black] (a)
						;						
					\end{scope}
     \end{tikzpicture}
     }
    \caption{}
    \label{subf:bdoorwrongivwrong}
    \end{subfigure}
    \begin{subfigure}[b]{0.32\linewidth}
     \centering
            \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
                    \begin{scope}[xshift=7.25cm]
                            \path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=z, yshift=.9cm] ()
                       %     node[below of=m, yshift=0.8cm](label) {$(h)$}
                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        
                        (u) edge[blue] (y)
                        (z) edge[black, bend right=25] (y)
						;
						
					\end{scope}
     \end{tikzpicture}
    }
   \caption{}
   \label{subf:bdoorwrongivwrong-exclusion}
    \end{subfigure} 
    \begin{subfigure}[b]{0.32\linewidth}
   \centering
            \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
                    \begin{scope}[xshift=14.5cm]
                            \path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=z, yshift=.9cm] ()
                   %         node[below of=m, yshift=0.8cm](label) {$(c)$}
                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (m)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
                        (a) edge[black, bend right=25] (y)
						;
						
					\end{scope}
					\end{tikzpicture}
     }
     \caption{}
     \label{subf:fdoorwrong-direct}
    \end{subfigure}

      \quad 
 
     \begin{subfigure}[b]{0.32\linewidth}
     \centering
            \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
                    \begin{scope}[xshift=7.25cm]
                            \path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=z, yshift=.5cm] ()
                   %         node[below of=m, yshift=0.8cm](label) {$(e)$}
                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
                        (z) edge[black, bend right=25] (y)
						;
						
					\end{scope}
     \end{tikzpicture}
     }
    \caption{}
    \label{subf:fdoorwrong-path}
    \end{subfigure} 
      \begin{subfigure}[b]{0.32\linewidth} 
     \centering
            \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
                    \begin{scope}[xshift=7.25cm]
					   \path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=m, yshift=.6cm](c5) {$C_5$}
                           % node[below of=m, yshift=0.4cm](label) {$(k)$}
                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (a) edge[black] (c5)
                        (y) edge[black] (c5)
                        (u) edge[blue] (y)
					  (u) edge[black] (m)	
						;	
					\end{scope}
                    \end{tikzpicture}
                }
                \caption{}
                \label{subf:fdoorwrongbdoorwrong-collider2}
            \end{subfigure} 
      \begin{subfigure}[b]{0.32\linewidth}
     \centering
            \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
                    \begin{scope}[xshift=14.5cm]
					   \path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=z, yshift=.9cm] ()
                          %  node[below of=m, yshift=0.8cm](label) {$(i)$}
                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
                        (u) edge[black] (a)
					  (u) edge[black] (m)	
						;	
					\end{scope}
                    
                    \end{tikzpicture}
                }
    \caption{}
    \label{subf:fdoorwrongbdoorwrong}
    \end{subfigure}
     
         \quad 
     \begin{subfigure}[b]{0.32\linewidth}
     \centering
            \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
				
					\begin{scope}[]
					   \path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=m, yshift=.6cm](c5) {$C_5$}
                     %       node[below of=m, yshift=0.4cm](label) {$(j)$}
                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (a) edge[black] (c5)
                        (y) edge[black] (c5)
                        (u) edge[black] (a)
					  (u) edge[black] (m)	
						;	
					\end{scope}
                \end{tikzpicture}
            }
    \caption{}
    \label{subf:fdoorwrongbdoorwrong-collider}
    \end{subfigure} 
    \begin{subfigure}[b]{0.32\linewidth}
     \centering
            \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
                    \begin{scope}[xshift=14.5cm]
                            \path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                             node[below of=z, yshift=.9cm] ()
                  %          node[below of=m, yshift=0.8cm](label) {$(f)$}
                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[black] (m)
                        (u) edge[blue] (y)
                        (z) edge[black, bend right=25] (y)
						;
						
					\end{scope}
				\end{tikzpicture}
    }
    \caption{}
    \label{subf:fdoorwrongivwrong}
    \end{subfigure} 
    
    \caption{Causal DAGs for the data-generating distribution for the simulation with backdoor, front-door, and IV models. Violation of assumptions is shown via solid black edges.}% (a) All three models are valid. (b) Front-door and IV models are valid, but the backdoor model is invalid  because of an unblocked backdoor path from $A$ to $Y$ through $U$. (c) Backdoor and IV models are valid, but the front-door model is invalid because there is a direct effect of $A$ on $Y$. (d) Backdoor and IV models are valid, but the front-door model is invalid because of an unblocked backdoor path from $M$ to $Y$. (f) Backdoor and front-door models are valid (where the backdoor model conditions on $Z$), but the IV is invalid because there is a direct effect of $Z$ on $Y$. (f) Backdoor model is valid, but front-door and IV models are invalid because there is an unblocked backdoor path from $M$ to $Y$ and a direct effect of $Z$ on $Y$. (g) Front-door model is valid, but backdoor and IV models are invalid because there are unblocked backdoor paths from $A$ to $Y$ and from $Z$ to $Y$. (h) Front-door model is valid, but backdoor and IV models are invalid because there is an unblocked backdoor path from $A$ to $Y$ and a direct effect of $Z$ on $Y$. (i) IV model is valid, but backdoor and front-door models are invalid because there are unblocked backdoor paths from $A$ to $Y$, from $A$ to $M$, and from $M$ to $Y$. (j) IV model is valid but backdoor and front-door models are invalid because the backdoor adjustment set includes the collider $C_1$ and because there is an unblocked backdoor path from $A$ to $M$ through $U$. (k) Correct IV model but backdoor and front-door models don't hold because there is an unblocked backdoor path from M to Y and a collider.}
    \label{fig:back-front-door-iv-dgp}
\end{figure*}


%% Bottom right, all three valid, (a)

We begin with data-generating processes for the simulation study combining the backdoor, front-door, and IV models, the results of which are shown in Figure~\ref{fig: backdoor-frontdoor-iv} and discussed in Section~\ref{sec:numerical}. Figure~\ref{fig:back-front-door-iv-dgp} shows the causal DAGs for this simulation. Figure~\ref{subf:allcorrect} shows the causal DAG in the setting where all three models are valid, which was used to generate the lines in the bottom right panels under the null and alternative of Figure~\ref{fig: backdoor-frontdoor-iv}. The precise data-generating process for this setting is as follows. We first generate
\begin{align*}
U  &\sim \n{Unif}(-2,2) \\
C_i &\sim \n{Unif}(-2,2), \text{ for } i = 1,2,3,4 \\
Z &\sim \n{Bern}(0.5).
\end{align*}
We also define
\[\pi(c_1, c_2, c_3) = \n{expit}\left\{c_1 + \n{expit}(c_2) + \sin(c_3) \right\}.\]
We then simulate $\bar{A}(1) \sim \n{Bern}(\pi(C_1, C_2, C_3))$ and $\bar{A}(0) \sim \n{Bern}(1-\pi(C_1, C_2, C_3))$. To make the monotonicity assumption hold for the IV model, we then convert all defiers to compliers by setting $A(1) = 1$ and $A(0) = 0$ if $\bar{A}(1)=0$ and $\bar{A}(0) = 1$, and setting $A(1) = \bar{A}(1)$ and $A(0) = \bar{A}(0)$ otherwise. The observed treatment A is then defined as $A = A(Z)$. Finally, we set
\begin{align*}
M &\sim \n{Bern}\left(\n{expit}\{5A - 1 + C_2\} \right)\\
Y &\sim N\left( \beta M + 3U + 2\sqrt{|C_1|} + \sin(C_4), 1\right).
\end{align*}

%% Bottom left, Backdoor false, front-door true, IV true, (b)

Figure~\ref{subf:bdoorwrong} shows the causal DAG  in the setting where the front-door and IV models are valid, but the backdoor model is invalid due to an unblocked path from $A$ to $Y$ through $U$. This DAG was used to generate the both lines in the bottom left panels under the null and alternative of Figure~\ref{fig: backdoor-frontdoor-iv}. The data-generating process for this setting when the identified backdoor functional is not zero under the null and alternative is the same as that described for (a) above, but we change $\pi$ to 
\[\pi(c_1, c_2, c_3, u) = \n{expit}\left\{c_1 + \n{expit}(c_2) + \sin(c_3) + u\right\}.\] 
The data-generating process for the setting when the identified backdoor functional equals zero under the null is the same as that described for (a) above, but we change the equations for $\pi$ and $Y$ to 
\begin{align*}
   \pi(c_1, c_2, c_3, u) &= \n{expit}\left\{c_1 + \n{expit}(c_2) + \sin(c_3) - u\right\}\\
    Y &\sim N\left(\beta M + U + 2\sqrt{|C_1|} + \sin(C_4), 1\right).
\end{align*}
The data-generating process for the setting when the identified backdoor functional equals zero under the alternative is the same as that described for (a) above, but we change the equations for $\pi$, $M$, and $Y$ to 
\begin{align*}
   \pi(c_1, c_2, z, u) &= \n{expit}\left\{-0.5 + 5z + c_1 +  \n{expit}(c_2) - 0.97u\right\}\\
   M &\sim \n{Bern}\left(\n{expit}\{2A - 1 + C_2\} \right)\\
    Y &\sim N\left(\beta M + 5U - 2\sqrt{|C_1|} + \sin(C_4), 1\right).
\end{align*}
Since $U$ has an effect on both $A$ and $Y$ but is not in the adjustment set for the backdoor model, the backdoor model is invalid.

%% Second-from-bottom right, Backdoor true, front-door true, IV false,

% (c): under null, = 0

Figure~\ref{subf:ivwrong-exclusion} shows the causal DAG in the setting where the backdoor and front-door models are valid, but the IV  model is invalid due to a direct effect of $Z$ on $Y$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model $\neq$ 0" in the second-from-bottom right panel under the null of Figure~\ref{fig: backdoor-frontdoor-iv}.  The data-generating process for this setting is the same as that described for (a) above, but we change the equation for $Y$ to
\[ Y\sim N\left(\beta M + U + 2\sqrt{|C_1|} + \sin(C_4) + 2Z, 1 \right).\]
Since $Z$ now has a direct effect on $Y$, the IV model is invalid.

% under null, \neq 0

To simulate data where the front-door and backdoor models are valid, but the IV  model is invalid (second-from-bottom right panels of Figure~\ref{fig: backdoor-frontdoor-iv}) under the null when the identified functional in the IV model equals 0 and under the alternative, we make the IV model invalid by violating the monotonicity assumption. This violation does not have a graphical visualization, so it is not displayed in Figure~\ref{fig:back-front-door-iv-dgp}. The equations for $U$, $C$, $Z$, and $\pi$ are as described for setting (a) above. We then simulate $A(1) \sim \n{Bern}(\pi(C_1, C_2, C_3))$ and $A(0) \sim \n{Bern}(1-\pi(C_1, C_2, C_3))$, and we set $A = A(Z)$. Finally, we change the equations for $M$ and $Y$ to
\begin{align*}
    M &\sim \n{Bern}\left(\n{expit}\{\alpha_1A + \alpha_2I\{A(0) < A(1)\}A - 1 + C_2\} \right)\\
    Y &\sim N\left(\beta M + U + 2\sqrt{|C_1|} + \sin(C_4), 1\right).
\end{align*}
Here, we set $\alpha_1 = 5$ and $\alpha_2=-3$ under the null, we set $\alpha_1 = 5$ and $\alpha_2=-2.838$ under the alternative if the identified IV functional equals zero, and we set $\alpha_1 = 2$ and $\alpha_2 = 3$ under the alternative if the identified IV functional is not zero. Since there are ``defiers" for whom $A(0) = 1$ but $A(1) = 0$, the IV model is invalid.


%% Second-from bottom left: Backdoor false, front-door true, IV false

% (d) null, \neq 0, 

Figure~\ref{subf:bdoorwrongivwrong} shows the causal DAG in the setting where the front-door model is valid, but the backdoor model is invalid due to an unblocked path from $A$ to $Y$ through $V$ and the IV  model is invalid due to an unblocked path from $Z$ to $Y$ through $U$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model $\neq$ 0" in the second-from-bottom left panel under the null of Figure~\ref{fig: backdoor-frontdoor-iv}. The data-generating process for this setting is the same as that described for (a) above, but we add $V \sim \n{Unif}(-2,2)$ and change the equations for $Z$, $\pi$, $M$ and $Y$ to
\begin{align*}
    Z &\sim \n{Bern}\left(\n{expit}\{2 + 2U\}\right)\\
   \pi(c_1, c_2, c_3, v) &= \n{expit}\left\{c_1 + \n{expit}(c_2) + \sin(C_3) + v\right\}\\
    M &\sim \n{Bern}\left(\n{expit}\{2A - 1 + C_2\} \right)\\
    Y &\sim N\left(\beta M + 2U + V + 2\sqrt{|C_1|} + \sin(C_4), 1\right).
\end{align*}
Since $V$ has an effect on both $A$ and $Y$, but is not in the adjustment set for the backdoor model, the backdoor model is invalid. Since $U$ has an effect on both $Z$ and $Y$, but is not in the adjustment set for the IV model, the IV model is invalid.

To simulate data where the front-door model is valid but the backdoor and IV models are invalid (second-from-bottom left panels of Figure~\ref{fig: backdoor-frontdoor-iv}) under the null when ``Identified functional in wrong model = 0" and under the alternative when ``Identified functional in wrong model = 0", we make the IV model invalid by violating the monotonicity assumption. This violation does not have a graphical visualisation, so it is not displayed in Figure~\ref{fig:back-front-door-iv-dgp}. The backdoor model is invalid due to an unblocked path from A to Y through U. The equations for $U$, $C$, $Z$, and $Y$ are as described for setting (a) above. We change the equation for $\pi$ to
\[\pi(c_1, c_2, c_3, u) = \n{expit}(c_1 + \n{expit}(c_2) + \sin(c_3) + u).\]
We then simulate $A(1) \sim \n{Bern}(\pi(C_1, C_2, C_3,U))$ and $A(0) \sim \n{Bern}(1-\pi(C_1, C_2, C_3,U))$, and we set $A = A(Z)$. 
Finally, we change the equations for $M$ and $Y$ to
\begin{align*}
    M &\sim \n{Bern}\left(\n{expit}\{\alpha_1A + \alpha_2I\{A(0) < A(1)\}A  - 1 + C_2\} \right)\\
    Y &\sim N\left(\beta M + U + 2\sqrt{|C_1|} + \sin(C_4), 1\right).
\end{align*}
Here, we set $\alpha_1 = 5$ and $\alpha_2=-3$ under the null, and we set $\alpha_1 = 5$ and $\alpha_2 = -2.63$ under the alternative. Since $U$ has an effect on both $A$ and $Y$, but is not in the adjustment set for the backdoor model, the backdoor model is invalid. Since there are ``defiers" for whom $A(0) = 1$ but $A(1) = 0$, the IV model is invalid. 

Figure~\ref{subf:bdoorwrongivwrong-exclusion} shows the causal DAG in the setting where the front-door model is valid, but the backdoor model is invalid due to an unblocked path from $A$ to $Y$ through $Z$ and the IV  model is invalid due to a direct effect of $Z$ on $Y$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model $\neq$ 0" in the second-from-bottom left panel under the alternative of Figure~\ref{fig: backdoor-frontdoor-iv}.  The data-generating process for this setting is the same as that described for (a) above, but we change the equations for $\pi$, $M$, and $Y$ to
\begin{align*}
\pi(c_1, c_2, c_3, u) &= \n{expit}\left\{c_1 + \n{expit}(c_2) + \sin(c_3)\right\} \\
M &\sim \n{Bern}\left(\n{expit}\{2A - 1 + C_2\} \right)\\
Y &\sim N\left(\beta M + 3U + 2\sqrt{|C_1|} + \sin(C_4) + 2Z, 1\right).
\end{align*}
Since $Z$ has an effect on both $A$ and $Y$, but is not in the adjustment set for the backdoor model, the backdoor model is invalid. Since $Z$ has a direct effect on $Y$, the IV model is invalid. 


Figure~\ref{subf:fdoorwrong-direct} shows the causal DAG in the setting where the backdoor and IV models are valid, but the front-door model is invalid due to a direct effect of $A$ on $Y$.  This DAG was used to generate the line corresponding to ``Identified functional in wrong model $=$ 0" in the third-from-bottom right panels under the null and alternative of Figure~\ref{fig: backdoor-frontdoor-iv}. The data-generating process for this setting is the same as that described for (a) above, but we change the equation for $Y$  to
\[Y \sim N\left(\beta A + 3U + 2\sqrt{|C_1|} + \sin(C_4), 1 \right).\]
Since $A$ now has a direct effect on $Y$, the front-door model is invalid.

Figure~\ref{subf:fdoorwrong-path} shows the causal DAG in the setting where the backdoor and IV models are valid, but the front-door model is invalid due to an unblocked path from $M$ to $Y$ through $U$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model $\neq$ 0" in the third-from-bottom right panels under the null and alternative of Figure~\ref{fig: backdoor-frontdoor-iv}. The data-generating process for this setting is the same as that described for (a) above, but we change the equation for $M$ to
\[M \sim \n{Bern}\left(\n{expit}\{3A - 1 + C_2 + U\} \right).\]
Since $U$ now has an effect on both $M$ and $Y$ but is not in the adjustment set for the front-door model, the front-door model is invalid.

Figure~\ref{subf:fdoorwrongbdoorwrong-collider2} shows the causal DAG in the setting where the IV model is valid, but the backdoor model is invalid due to controlling for the collider $C_5$ and the front-door  model is invalid due to an unblocked path from $M$ to $Y$ through $U$ and because $M$ does not fully mediate the effect of $A$ on $Y$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model $\neq$ 0" in the third-from-bottom left panel under the null of Figure~\ref{fig: backdoor-frontdoor-iv}.  The data-generating process for this setting is the same as that described for (a) above, but we change the equations for $\pi$, $M$, and $Y$ to
\begin{align*}
\pi(c_1, c_2, c_3) &= \n{expit}\left\{c_4 + \n{expit}(c_2) + \sin(c_3) \right\} \\
M &\sim \n{Bern}(\n{expit}\{5A - 1 + C_2 + 2U\})\\
Y &\sim N\left(\beta M + U + \sin(C_4), 1\right)
\end{align*}
and we simulate $C_5  \sim N\left( 3A - Y, 1 \right)$. Since $C_5$ is a $A$-$Y$ collider and it is adjusted for in the backdoor model, the backdoor model is invalid. Since $U$ has an effect on both $M$ and $Y$ but is not in the adjustment set for the front-door model, the front-door model is invalid.


Figure~\ref{subf:fdoorwrongbdoorwrong} shows the causal DAG in the setting where the IV model is valid, but the backdoor model is invalid due to an unblocked path from $A$ to $Y$ through $U$ and the front-door  model is invalid due to an unblocked path from $M$ to $Y$ through $U$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model = 0" in the third-from-bottom left panel under the null and the line corresponding to ``Identified functional in wrong model $\neq$ 0" in the third-from-bottom left panel under the alternative of Figure~\ref{fig: backdoor-frontdoor-iv}.  The data-generating process for this setting is the same as that described for (a) above, but we change the equations for $\pi$ and $M$ to
\begin{align*}
\pi(c_1, c_2, c_3, u) &= \n{expit}\left\{c_1 + \n{expit}(c_2) + \sin(c_3) + u\right\} \\
M &\sim \n{Bern}\left(\n{expit}\{3A - 1 + C_2 + U\} \right).
\end{align*}
Since $U$ has an effect on both $A$ and $Y$, but is not in the adjustment set for the backdoor model, the backdoor model is invalid. Since $U$ has an effect on both $M$ and $Y$, but is not included in the adjustment set for the font-door model, the front-door model is invalid.

Figure~\ref{subf:fdoorwrongbdoorwrong-collider} shows the causal DAG in the setting where the IV model is valid, but the backdoor model is invalid due to controlling for the collider $C_5$ and the front-door model is invalid due to an unblocked path from $A$ to $M$ through $U$ and because $M$ does not fully mediate the effect of $A$ on $Y$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model = 0" in the third-from-bottom left panel under the alternative of Figure~\ref{fig: backdoor-frontdoor-iv}.  The data-generating process for this setting is the same as that described for (a) above, but we change the equations for $\pi$, $M$, and $Y$ to
\begin{align*}
\pi(c_1, c_2, c_3, u) &= \n{expit}\left\{c_4 + \sin(c_3) - u\right\} \\
M &\sim \n{Bern}\left\{\n{expit}(5A - 1 + C_2 - 2U) \right\}\\
Y &\sim N\left(\beta M - 5\sin(C_4), 1\right)
\end{align*}
and we simulate $C_5 \sim N\left( -2A - 5Y, 1 \right)$. Since $C_5$ is a $A$-$Y$ collider and it is adjusted for in the backdoor model, the backdoor model is invalid.  Since $U$ has an effect on both $A$ and $M$ but is not in the adjustment set for the front-door model, the front-door model is invalid.






Figure~\ref{subf:fdoorwrongivwrong} shows the causal DAG in the setting where the backdoor model is valid, but the front-door model is invalid due to an unblocked path from $M$ to $Y$ through $U$ and the IV  model is invalid due to a direct effect of $Z$ on $Y$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model $\neq$ 0" in the top right panel under the null of Figure~\ref{fig: backdoor-frontdoor-iv}.  The data-generating process for this setting is the same as that described for (a) above, but we change the equations for $M$ and $Y$ to
\begin{align*}
M &\sim \n{Bern}\left(\n{expit}\{2A - 1 + C_2 + U\} \right)\\
Y &\sim N\left(\beta M - 3U + 2\sqrt{|C_1|} + \sin(C_4) + 2Z, 1\right).
\end{align*}
Since $U$ has an effect on both $M$ and $Y$ but is not in the adjustment set for the front-door model, the front-door model is invalid. Since $Z$ has a direct effect on $Y$, the IV model is invalid. We note that $Z$ is included in the adjustment set for the backdoor model, since otherwise there would be an unblocked path from $A$ to $Y$ through $Z$.

To simulate data where the backdoor model is valid but the front-door and IV models are invalid (top right panels of Figure~\ref{fig: backdoor-frontdoor-iv}) under the null when ``Identified functional in wrong model = 0" and under the alternative when ``Identified functional in wrong model $\neq$ 0", we make the IV model invalid by violating the monotonicity assumption. As above, this violation does not have a graphical visualisation, so it is not displayed in Figure~\ref{fig:back-front-door-iv-dgp}. The front-door model is invalid due to an unblocked path from M to Y through U. The equations for $U$, $C$, $Z$, and $\pi$ are as described for setting (a) above. We then simulate $A(1) \sim \n{Bern}(\pi(C_1, C_2, C_3))$ and $A(0) \sim \n{Bern}(1-\pi(C_1, C_2, C_3))$, and we set $A = A(Z)$. We also change the equations for $M$ and $Y$ to
\begin{align*}
    M &\sim I\{A(0) < A(1)\}\n{Bern}\left(\n{expit}\{5A - 1 + C_2 + U\} \right) + I\{A(0) \geq A(1)\}\n{Bern}\left(\n{expit}\{2A - 1 + C_2 + U\} \right) \\
 Y &\sim N\left( \beta M - 3U + 2\sqrt{|C_1|} + \sin(C_4), 1 \right).
\end{align*}
Since $U$ has an effect on both $M$ and $Y$, but is not in the adjustment set for the front-door model, the front-door model is invalid. Since there are ``defiers" for whom $A(0) = 1$ but $A(1) = 0$, the IV model is invalid. 

To simulate data where the backdoor model is valid but the front-door and IV models are invalid in the top right panel under the alternative of Figure~\ref{fig: backdoor-frontdoor-iv} when ``Identified functional in wrong model = 0", we make the IV model invalid by violating the monotonicity assumption. As above, this violation does not have a graphical visualisation, so it is not displayed in Figure~\ref{fig:back-front-door-iv-dgp}. The front-door model is invalid due to a direct effect of A on Y. The equations for $U$, $C$, $Z$, and $\pi$ are as described for setting (a) above. We then simulate $A(1) \sim \n{Bern}(\pi(C_1, C_2, C_3))$ and $A(0) \sim \n{Bern}(1-\pi(C_1, C_2, C_3))$, and we set $A = A(Z)$. We also change the equations for $M$ and $Y$ to
\begin{align*}
    &M \sim I\{A(0) < A(1)\}\n{Bern}\left(\n{expit}\{2A - 1 + C_2\} \right) + I\{A(0) \geq A(1)\}\n{Bern}\left(\n{expit}\{5A - 1 + C_2\} \right) \\
    &Y \sim N\left( \beta A + 3U + 2\sqrt{|C_1|} + \sin(C_4), 1 \right).
\end{align*}



\subsection{Backdoor and Front-door Models}

\begin{figure*}[t]
    \centering
    \begin{subfigure}[b]{0.24\linewidth}
        \centering
			\scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
					
					\begin{scope}[]   
						\path[->, very thick]
						node[] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=a, yshift=1.3cm] ()
                       %     node[below of=m, yshift=0.8cm](label) {$(a)$}
						
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
						;
					\end{scope}
     \end{tikzpicture}
     }
     \caption{}
     \label{subf:bdoorfdoor-allcorrect}
     \end{subfigure}
     \begin{subfigure}[b]{0.24\linewidth}
        \centering
			\scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]	
					% \begin{scope}[xshift=5.5cm]
                    \begin{scope}[xshift=5.25cm]          \path[->, very thick]
						node[] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=a, yshift=1.3cm] ()
                           % node[below of=m, yshift=0.8cm](label) {$(b)$}
						
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
					  (u) edge[black] (a)
						;
						
					\end{scope}
     \end{tikzpicture}
     }
     \caption{}
     \label{subf:bdoorfdoor-bdoorwrong}
     \end{subfigure}
    \begin{subfigure}[b]{0.24\linewidth}
        \centering
			\scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
					\begin{scope}[xshift=10.75cm]
                            \path[->, very thick]
						node[] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=a, yshift=1.3cm] ()
                          %  node[below of=m, yshift=0.8cm](label) {$(c)$}
						
                        (a) edge[blue] (m)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
                        (a) edge[black, bend right=25] (y)
						;
						
					\end{scope}
     \end{tikzpicture}
     }
     \caption{}
     \label{subf:bdoorfdoor-fdoorwrong}
     \end{subfigure}
     \begin{subfigure}[b]{0.24\linewidth}
        \centering
			\scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
                    \begin{scope}[xshift=16cm]
					   \path[->, very thick]
						node[] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=a, yshift=1.3cm] ()
                           % node[below of=m, yshift=0.8cm](label) {$(d)$}
						
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
					  (u) edge[black] (m)	
						;	
					\end{scope}
				
				\end{tikzpicture}
			}
   \caption{}
   \label{subf:bdoorfdoor-fdoorwrong2}
		\end{subfigure}
		\caption{Causal DAGs for the data-generating distribution for the simulation with backdoor and front-door models. Violation of assumptions is shown via solid black edges. }%$K = 2$ with backdoor and front-door models. Violation of assumptions is shown via red dashed edges. (a) Plausible causal models; (b) Correct front-door model, but the backdoor model doesn't hold because of an unblocked backdoor path; (c) Correct backdoor model, but the front-door model doesn't hold because M doesn't intercept all directed paths from A to Y; (d) Correct backdoor model, but the front-door model doesn't hold because of an unblocked backdoor path from M to Y. }
		\label{fig:back-front-door-dgp}
\end{figure*}


We next present the  data-generating processes for the simulation study combining the backdoor and front-door models, the results of which are shown in Figure~\ref{fig: backdoor-frontdoor}. Figure~\ref{fig:back-front-door-dgp} shows the causal DAGs for this simulation. Figure~\ref{subf:bdoorfdoor-allcorrect} shows the causal DAG in the setting where both models are valid, which was used to generate the lines in the bottom right panels under the null and alternative of Figure~\ref{fig: backdoor-frontdoor}. The precise data-generating process for this setting is as follows. We generate
\begin{align*}
U  &\sim \n{Unif}(-2,2) \\
C_i &\sim \n{Unif}(-2,2), \text{ for } i = 1,2,3,4 \\
A &\sim \n{Bern}\left(\n{expit}\left\{C_1 + \n{expit}(C_2) + \sin(C_3) \right\}\right) \\
M &\sim \n{Bern}\left(\n{expit}\{2A - 1 + C_2\} \right)\\
Y &\sim N\left( \beta M + 2U + 2\sqrt{|C_1|} + \sin(C_4), 1\right).
\end{align*}

Figure~\ref{subf:bdoorfdoor-bdoorwrong} shows the causal DAG  in the setting where the front-door model is valid, but the backdoor model is invalid due to an unblocked path from $A$ to $Y$ through $U$. This DAG was used to generate both lines in the bottom left panels under the null and alternative of Figure~\ref{fig: backdoor-frontdoor}.  The data-generating process for this setting when ``Identified functional in wrong model $\neq$ 0" under the null and under the alternative is the same as that described for (a) above, but we change the formula for $A$ to
\[A \sim \n{Bern}\left(\n{expit}\left\{C_1 + \n{expit}(C_2) + \sin(C_3) + U\right\}\right).\] 
The data-generating process for this setting when ``Identified functional in wrong model = 0" under the null is the same as that described for (a) above, but we change the equations for $A$, $M$, and $Y$ to
\begin{align*}
A &\sim \n{Bern}\left(\n{expit}\left\{C_1 + \n{expit}(C_2) + \sin(C_3) - 0.05U \right\}\right) \\
M &\sim \n{Bern}\left(\n{expit}\{5A - 1 + C_2\} \right)\\
Y &\sim N\left( \beta M + 0.05U + 2\sqrt{|C_1|} + \sin(C_4), 1\right).
\end{align*}
The data-generating process for this setting when ``Identified functional in wrong model = 0" under the alternative is the same as that described for (a) above, but we change the equations for $A$, $M$, and $Y$ to
\begin{align*}
A &\sim \n{Bern}\left(\n{expit}\left\{C_1 - \n{expit}(C_2) - \sin(C_3) + 0.6U \right\}\right) \\
M &\sim \n{Bern}\left(\n{expit}\{0.37A - 1 + C_2\} \right)\\
Y &\sim N\left( \beta M - 0.9U + 2\sqrt{|C_1|} + \sin(C_4), 1\right).
\end{align*}
Since $U$ has an effect on both $A$ and $Y$ but is not in the adjustment set for the backdoor model, the backdoor model is invalid.

Figure~\ref{subf:bdoorfdoor-fdoorwrong} shows the causal DAG  in the setting where the backdoor model is valid, but the front-door model is invalid due to a direct effect of $A$ on $Y$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model = 0" in the top right panels under the null and alternative of Figure~\ref{fig: backdoor-frontdoor}.  The data-generating process for this setting is the same as that described for (a) above, but we change the formula for $Y$ to
\[Y \sim N\left( \beta A + 2U + 2\sqrt{|C_1|} + \sin(C_4), 1\right).\] 
Since $A$ has a direct effect on $Y$, the front-door model is invalid.

Figure~\ref{subf:bdoorfdoor-fdoorwrong2} shows the causal DAG  in the setting where the backdoor model is valid, but the front-door model is invalid due to an unblocked path from $M$ to $Y$ through $U$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model $\neq$ 0" in the top right panels under the null and alternative of Figure~\ref{fig: backdoor-frontdoor}.  The data-generating process for this setting is the same as that described for (a) above, but we change the formula for $M$ to
\[M \sim \n{Bern}\left(\n{expit}\{2A - 1 + C_2 +U\} \right).\] 
Since $U$ has an effect on both $M$ and $Y$, but is not included in the adjustment set for the front-door model, the front-door model is invalid.

\subsection{Backdoor and IV models}

\begin{figure*}[hbt!]
    \centering
     \begin{subfigure}[b]{0.24\linewidth}
        \centering
			\scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
			
					\begin{scope}[]   
						\path[->, very thick]
                            node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (y) {$Y$}
                            node[above right of=a, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                           % node[below of=a, yshift=0.4cm](label) {$(a)$}
                            node[below of=a, yshift=.6cm]()

                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (y)                      
                        (c) edge[blue] (a)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
						;
					\end{scope}
      \end{tikzpicture}
     }
     \caption{}
     \label{subf:bdooriv-allcorrect}
     \end{subfigure}		
		 \begin{subfigure}[b]{0.24\linewidth}
        \centering
			\scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]			
                    \begin{scope}[xshift=5.25cm]                        \path[->, very thick]
                            node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (y) {$Y$}
                            node[above right of=a, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                          %  node[below of=a, yshift=0.4cm](label) {$(b)$}
                            node[below of=a, yshift=.6cm]()
                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (y)                      
                        (c) edge[blue] (a)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
                        (u) edge[black] (a)
						;						
					\end{scope}
 \end{tikzpicture}
     }
     \caption{}
      \label{subf:bdooriv-bdoorwrong}
     \end{subfigure}
      \begin{subfigure}[b]{0.24\linewidth}
        \centering
			\scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
					\begin{scope}[xshift=10.75cm]
                            \path[->, very thick]
                            node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (y) {$Y$}
                            node[above right of=a, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=a, yshift=.6cm](c5) {$C_5$}
                         %   node[below of=a, yshift=0.4cm](label) {$(c)$}
                         
					  (z) edge[blue] (a)	
                        (a) edge[blue] (y)                      
                        (c) edge[blue] (a)
                        (c) edge[blue] (y)
                        (a) edge[black] (c5)
                        (y) edge[black] (c5)
                        (u) edge[blue] (y)
						;
						
					\end{scope}
 \end{tikzpicture}
     }
     \caption{}
      \label{subf:bdooriv-bdoorwrong-collider}
     \end{subfigure}
 \begin{subfigure}[b]{0.24\linewidth}
        \centering
			\scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
                    \begin{scope}[xshift=16cm]
					   \path[->, very thick]
                            node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (y) {$Y$}
                            node[above right of=a, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=a, yshift=.6cm]()
                        %    node[below of=a, yshift=0.4cm](label) {$(d)$}
                        
					  (z) edge[blue] (a)	
                        (a) edge[blue] (y)                      
                        (c) edge[blue] (a)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
                        (z) edge[black, bend right=25] (y)
						;	
					\end{scope}
				
				\end{tikzpicture}
			}
        \caption{}
         \label{subf:bdooriv-ivwrong}
        \end{subfigure}
		\caption{ $K = 2$ with backdoor and IV models. Violation of assumptions is shown via solid black edges.}%(a) Plausible causal models; (b) Correct IV model, but the backdoor model doesn't hold because of an unblocked backdoor path from A to Y; (c) Correct IV model, but the backdoor model doesn't hold because of a collider; (d) Correct backdoor model, but the IV model doesn't hold because there is a direct effect from Z to Y. }
		\label{fig:back-iv-dgp}
\end{figure*}

We next present the  data-generating processes for the simulation study combining the backdoor and IV models, the results of which are shown in Figure~\ref{fig: backdoor-iv}. Figure~\ref{fig:back-iv-dgp} shows the causal DAGs for this simulation. Figure~\ref{subf:bdooriv-allcorrect} shows the causal DAG in the setting where both models are valid, which was used to generate the lines in the bottom right panels under the null and alternative of Figure~\ref{fig: backdoor-iv}. The precise data-generating process for this setting is as follows. We first generate
\begin{align*}
U  &\sim \n{Unif}(-2,2) \\
C_i &\sim \n{Unif}(-2,2), \text{ for } i = 1,2,3,4 \\
Z &\sim \n{Bern}(0.5).
\end{align*}
We also define
\[\pi(c_1, c_2, c_3) = \n{expit}\left\{c_1 + \n{expit}(c_2) + \sin(c_3) \right\}.\]
We then simulate $\bar{A}(1) \sim \n{Bern}(\pi(C_1, C_2, C_3))$ and $\bar{A}(0) \sim \n{Bern}(1-\pi(C_1, C_2, C_3))$. As above, to make the monotonicity assumption hold for the IV model, we then convert all defiers to compliers by setting $A(1) = 1$ and $A(0) = 0$ if $\bar{A}(1)=0$ and $\bar{A}(0) = 1$, and setting $A(1) = \bar{A}(1)$ and $A(0) = \bar{A}(0)$ otherwise. The observed treatment A is then defined as $A = A(Z)$. Finally, we set
\begin{align*}
Y &\sim N\left( \beta A + 2U + 2\sqrt{|C_1|} + \sin(C_4), 1\right).
\end{align*}

Figure~\ref{subf:bdooriv-bdoorwrong} shows the causal DAG  in the setting where  the IV model is valid, but the backdoor model is invalid due to an unblocked  path from $A$ to $Y$ through $U$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model = 0" in the bottom left panel under the null  of Figure~\ref{fig: backdoor-iv}.  The data-generating process for this setting is the same as that described for (a) above, but we change the formula for $\pi$ to
\[\pi(c_1, c_2, c_3, u) = \n{expit}\left\{c_1 + \n{expit}(c_2) + \sin(c_3) + u\right\}.\] 
Since $U$ has an effect on both $A$ and $Y$ but is not in the adjustment set for the backdoor model, the backdoor model is invalid.

Figure~\ref{subf:bdooriv-bdoorwrong-collider} shows the causal DAG  in the setting where the IV model is valid, but the backdoor model is invalid due to controlling for the collider $C_5$. This DAG was used to generate the lines corresponding to ``Identified functional in wrong model $\neq$ 0" in the bottom left panels under the null and alternative of Figure~\ref{fig: backdoor-iv} as well as the line corresponding to ``Identified functional in wrong model $=$ 0" in the bottom left panel under the alternative of Figure~\ref{fig: backdoor-iv}. The data-generating process for this setting when ``Identified functional in wrong model $\neq$ 0" under the null is the same as that described for (a) above, but we change the equations for $\pi$ and $Y$ to
\begin{align*}
    \pi(c_1, c_2, &c_3) = \n{expit}\left\{c_4 + \n{expit}(c_2) + \sin(c_3) \right\}\\
    Y &\sim N\left( \beta A + 2U + \sin(C_4), 1\right).
\end{align*}
We then simulate $C_5$ as
\begin{align*}
C_5 &\sim N\left( 2A + Y, 1\right).
\end{align*}
The data-generating process for this setting when ``Identified functional in wrong model $\neq$ 0" under the alternative is the same as that described for (a) above, but we change the equations for $\pi$ and $Y$ to
\begin{align*}
    \pi(c_1, c_2, c_3) &= \n{expit}\left\{c_4 + \n{expit}(c_2) + \sin(c_3) \right\}\\
    Y &\sim N\left( \beta A + 2U + \sin(C_4), 1\right).
\end{align*}
We then simulate $C_5$ as
\begin{align*}
C_5 &\sim N\left( A + Y, 1\right).
\end{align*}
The data-generating process for this setting when ``Identified functional in wrong model = 0" under the alternative is the same as that described for (a) above, but we change the equations for $\pi$ and $Y$ to
\begin{align*}
    \pi(c_1, c_2, &c_3) = \n{expit}\left\{c_4 + \n{expit}(c_2) + \sin(c_3) \right\}\\
    Y &\sim N\left( \beta A - 3U - \sin(C_4), 1\right).
\end{align*}
We then simulate $C_5$ as
\begin{align*}
C_5 &\sim N\left( 0.6A + 2Y, 1\right).
\end{align*}
Since $C_5$ is a collider and is included in the adjustment set for the backdoor model, the backdoor model is invalid.

Figure~\ref{subf:bdooriv-ivwrong} shows the causal DAG  in the setting where  the backdoor model is valid, but the IV model is invalid due to a direct effect of $A$ on $Y$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model $\neq$ 0" in the top left panel under the null of Figure~\ref{fig: backdoor-iv}. The data-generating process for this setting is the same as that described for (a) above, but we change the equation for $Y$ to
\begin{align*}
    Y &\sim N\left( \beta A + 2U + 2\sqrt{|C_1|} + \sin(C_4) + 2Z, 1\right).
\end{align*}
Since $Z$ has a direct effect on $Y$, the IV model is invalid.

To simulate data where the backdoor model is valid but the IV model is invalid (top right panels of Figure~\ref{fig: backdoor-iv}) under the null when the identified functional in the IV model equals 0 and under both cases for the alternative, we make the IV model invalid by violating the monotonicity assumption. As above, this violation does not have a graphical visualisation, so it is not displayed in Figure~\ref{fig:back-iv-dgp}. The equations for $U$, $C$, $Z$, and $\pi$ are as described for setting (a) above. We then simulate $A(1) \sim \n{Bern}(\pi(C_1, C_2, C_3))$ and $A(0) \sim \n{Bern}(1-\pi(C_1, C_2, C_3))$, and we set $A = A(Z)$. We also change the equation for $Y$ to 
\begin{align*}
    Y &\sim N\left(\beta_1 A + \beta_2I\{A(0) > A(1)\}A + 2U + 2\sqrt{|C_1|} + \sin(C_4), 1\right)
\end{align*}
Here, we set $\beta_1 = 0$ and $\beta_2 = 0$ under the null, we set $\beta_1 = 5.75$ and $\beta_2=4.25$ under the alternative if the identified IV functional equals zero, and we set $\beta_1 = 10$ and $\beta_2 = -8$ under the alternative if the identified IV functional is not zero. Since there are ``defiers" for whom $A(0) = 1$ but $A(1) = 0$, the IV model is invalid.



\subsection{Front-door and IV Models}

\begin{figure*}[hbt!]
    \centering
	\begin{subfigure}[b]{0.49\linewidth}
                    \centering
                    \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
					
					\begin{scope}[]   
						\path[->, very thick]
                            node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=m, yshift=1.4cm]() 
                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
                        (u) edge[blue] (a)
						;
					\end{scope}
                \end{tikzpicture}
                }
            \caption{}
            \label{subf:fdooriv-allcorrect}
        \end{subfigure}
        \begin{subfigure}[b]{0.49\linewidth}
                    \centering
                    \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
                    \begin{scope}[xshift=7.25cm]
                            \path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                           % node[below of=m, yshift=0.8cm]()
                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (m)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
                        (a) edge[black, bend right=25] (y)
						;
						
					\end{scope}
                \end{tikzpicture}
                }
            \caption{}
            \label{subf:fdooriv-fdoorwrong}
        \end{subfigure}

        \quad
        
        \begin{subfigure}[b]{0.49\linewidth}
                    \centering
                    \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
                    \begin{scope}[xshift=14.5cm]
					   \path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                            node[below of=m, yshift=1cm]()
                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (y)
					  (u) edge[black] (m)	
						;	
			\end{scope}
		\end{tikzpicture}
            }
            \caption{}
            \label{subf:fdooriv-fdoorwrong2}
          \end{subfigure}             
	  \begin{subfigure}[b]{0.49\linewidth}
                    \centering
                    \scalebox{0.8}{
				\begin{tikzpicture}[>=stealth, node distance=2cm]
					\tikzstyle{square} = [draw, thick, minimum size=1.0mm, inner sep=3pt]
     
                    \begin{scope}[]
                            \path[->, very thick]
						node[] (z) {$Z$}
						node[right of=z] (a) {$A$}
						node[right of=a] (m) {$M$}
						node[right of=m] (y) {$Y$}
                            node[above right of=m, xshift=-1.4cm] (c) {$C$} 
                            node[above right of=y, xshift=-1.4cm] (u) {$U$}
                           % node[below of=m, yshift=1.2cm]()
                            
					  (z) edge[blue] (a)	
                        (a) edge[blue] (m)
                        (m) edge[blue] (y)
                        (c) edge[blue] (a)
                        (c) edge[blue] (m)
                        (c) edge[blue] (y)
                        (u) edge[blue] (a)
                        (u) edge[blue] (y)
                        (z) edge[black, bend right=25] (y)
						;
						
					\end{scope}
				\end{tikzpicture}
                    }
                    \caption{}
                    \label{subf:fdooriv-ivwrong}
                \end{subfigure} 
	\caption{Causal DAGs for the data-generating distribution for the simulation with front-door and IV models. Violation of assumptions is shown via solid black edges.} % (a) Plausible causal models; (b) Correct IV model, but the front-door model doesn't hold because M doesn't intercept all directed paths from A to Y; (c) Correct IV model, but the front-door model doesn't hold because of an unblocked backdoor path from M to Y; (d) Correct front-door model, but the IV model doesn't hold because there is a direct effect from Z to Y.}
    \label{fig:front-iv-dgp}
\end{figure*}

We next present the  data-generating processes for the simulation study combining the front-door and IV models, the results of which are shown in Figure~\ref{fig: frontdoor-iv}. Figure~\ref{fig:front-iv-dgp} shows the causal DAGs for this simulation. Figure~\ref{subf:fdooriv-allcorrect} shows the causal DAG in the setting where both models are valid, which was used to generate the lines in the bottom right panels under the null and alternative of Figure~\ref{fig: frontdoor-iv}. The precise data-generating process for this setting is as follows. We first generate
\begin{align*}
U  &\sim \n{Unif}(-2,2) \\
C_i &\sim \n{Unif}(-2,2), \text{ for } i = 1,2,3,4 \\
Z &\sim \n{Bern}(0.5).
\end{align*}
We also define
\[\pi(c_1, c_2, c_3, u) = \n{expit}\left\{c_1 + \n{expit}(c_2) + \sin(c_3) + u \right\}.\]
We then simulate $\bar{A}(1) \sim \n{}(\pi(C_1, C_2, C_3, U))$ and $\bar{A}(0) \sim \n{}(1-\pi(C_1, C_2, C_3, U))$. To make the monotonicity assumption hold for the IV model, we then convert all defiers to compliers by setting $A(1) = 1$ and $A(0) = 0$ if $\bar{A}(1)=0$ and $\bar{A}(0) = 1$, and setting $A(1) = \bar{A}(1)$ and $A(0) = \bar{A}(0)$ otherwise. The observed treatment A is then defined as $A = A(Z)$. Finally, we set
\begin{align*}
M &\sim \n{}\left(\n{expit}\{5A - 1 + C_2\} \right)\\
Y &\sim N\left( \beta M + 3U + 2\sqrt{|C_1|} + \sin(C_4), 1\right).
\end{align*}

Figure~\ref{subf:fdooriv-fdoorwrong} shows the causal DAG  in the setting where  the IV model is valid, but the front-door model is invalid due to a direct effect of $A$ on $Y$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model $=$ 0" in the bottom left panels under the null and alternative of Figure~\ref{fig: frontdoor-iv}.  The data-generating process for this setting is the same as that described for (a) above, but we change the equations for $\pi$ and $Y$ to
\begin{align*}
    \pi(c_1, c_2, c_3) &= \n{expit}\left\{c_1 + \n{expit}(c_2) + \sin(c_3)\right\} \\
    Y &\sim N\left( \beta A + 3U + 2\sqrt{|C_1|} + \sin(C_4), 1\right).
\end{align*}
Since $A$ has a direct effect on $Y$, the front-door model is invalid.

Figure~\ref{subf:fdooriv-fdoorwrong2} shows the causal DAG  in the setting where  the IV model is valid, but the front-door model is invalid due to an unblocked path from $M$ to $Y$ through $U$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model $\neq$ 0" in the bottom left panels under the null and alternative of Figure~\ref{fig: frontdoor-iv}.  The data-generating process for this setting is the same as that described for (a) above, but we change the equations for $\pi$ and $M$ to
\begin{align*}
    \pi(c_1, c_2, c_3) &= \n{expit}\left\{c_1 + \n{expit}(c_2) + \sin(c_3)\right\} \\
    M &\sim \n{Bern}\left(\n{expit}\{3A - 1 + C_2 + U\} \right).
\end{align*}
Since $U$ has an effect on both $M$ and $Y$ but is not in the adjustment set for the front-door model, the front-door model is invalid.

Figure~\ref{subf:fdooriv-ivwrong} shows the causal DAG  in the setting where  the front-door model is valid, but the IV model is invalid due to direct effect of $Z$ on $Y$. This DAG was used to generate the line corresponding to ``Identified functional in wrong model $\neq$ 0" in the top right panels under the null and alternative of Figure~\ref{fig: frontdoor-iv}.  The data-generating process for this setting is the same as that described for (a) above, but we change the equation for $Y$ to
\begin{align*}
    Y &\sim N\left( \beta M + 3U + 2\sqrt{|C_1|} + \sin(C_4) + 2Z, 1\right).
\end{align*}
Since $Z$ has a direct effect on $Y$, the IV model is invalid.


To simulate data where the front-door model is valid but the IV model is invalid (top right panels of Figure~\ref{fig: frontdoor-iv}) under the null and alternative when the identified functional in the IV model equals 0, we make the IV model invalid by violating the monotonicity assumption. As above, this violation does not have a graphical visualisation, so it is not displayed in Figure~\ref{fig:front-iv-dgp}. The equations for $U$, $C$, $Z$, and $\pi$ are as described for setting (a) above. We then simulate $A(1) \sim \n{}(\pi(C_1, C_2, C_3, U))$ and $A(0) \sim \n{}(1-\pi(C_1, C_2, C_3,U))$, and we set $A = A(Z)$. We change the equations for $M$ and $Y$ under the null to
\begin{align*}
    M &\sim I\{A(0) < A(1)\}\n{}\left(\n{expit}\{2A - 1 + C_2\} \right) \\
    &\quad+ I\{A(0) \geq A(1)\}\n{}\left(\n{expit}\{5A - 1 + C_2\} \right) \\
    Y &\sim N\left( \beta M + 2U + 2\sqrt{|C_1|} + \sin(C_4), 1 \right),
\end{align*}
and we change the equations for $M$ and $Y$ under the alternative to
\begin{align*}
    M &\sim I\{A(0) < A(1)\}\left(\n{expit}\{2.38A - 1 + C_2\} \right) + I\{A(0) \geq A(1)\}\left(\n{expit}\{5A - 1 + C_2\} \right) \\
    Y &\sim N\left( \beta M + U + 2\sqrt{|C_1|} + \sin(C_4), 1 \right).
\end{align*}
Since there are ``defiers" for whom $A(0) = 1$ but $A(1) = 0$, the IV model is invalid.



\subsection{Backdoor models with different adjustment sets}

Finally, we present the data-generating processes for the simulation study combining three backdoor models with different adjustment sets, the results of which are shown in Figure~\ref{fig:back-adjustments} and discussed in Section~\ref{sec:numerical}. We simulate data as follows:

\begin{align*}
    U &\sim \n{Unif}(-2,2) \\
    C_i &\sim \n{Unif}(-2,2), \text{ for } i = 1, 2,3,4 \\
    A &\sim \n{Bern}\left(\n{expit}\left\{C_1 + C_2\right\}\right) \\
    Y &\sim N\left(\beta A + 4C_2 + C_3 + U, 1\right).
\end{align*}

\section{ADDITIONAL SIMULATION RESULTS}
\label{sec:more sims}

Figures~\ref{fig: backdoor-frontdoor},~\ref{fig: backdoor-iv}, and~\ref{fig: frontdoor-iv} display the size and power of the test for the case of $K = 2$ when combining the backdoor and front-door, backdoor and IV, and front-door and IV models, respectively.

\begin{figure*}[h!]
  \centering
  \includegraphics[width=0.49\linewidth]{./images/backdoor_frontdoor_null.png}
  \includegraphics[width=0.49\linewidth]{./images/backdoor_frontdoor_alternative.png}
  \caption{Size (left) and power (right) of the test when combining the backdoor model $\s{M}_1$ and front-door model $\s{M}_2$ when at least one of the models holds.}
  \label{fig: backdoor-frontdoor}
\end{figure*}

\begin{figure*}[hbt!]
  \centering
  \includegraphics[width=0.49\linewidth]{./images/backdoor_iv_null.png}
  \includegraphics[width=0.49\linewidth]{./images/backdoor_iv_alternative.png}
  \caption{Size (left) and power (right) of the test when combining the backdoor model $\s{M}_1$ and IV model $\s{M}_3$ when at least one of the models holds.}
  \label{fig: backdoor-iv}
\end{figure*}

\begin{figure*}[hbt!]
  \centering
  \includegraphics[width=0.49\linewidth]{./images/frontdoor_iv_null.png}
  \includegraphics[width=0.49\linewidth]{./images/frontdoor_iv_alternative.png}
  \caption{Size (left) and power (right) of the test when combining the front-door model $\s{M}_2$ and IV model $\s{M}_3$ when at least one of the models holds.}
  \label{fig: frontdoor-iv}
\end{figure*}


\end{document}
