% \documentclass{uai2023} % for initial submission
\documentclass[accepted]{uai2023} % after acceptance, for a revised
                                    % version; also before submission to
                                    % see how the non-anonymous paper
                                    % would look like

%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2023} % ptmx math instead of Computer
% Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2023} % newtx fonts (improves upon
 % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams
\usepackage{caption}
\usepackage{subcaption} % allows for subfigures in figures
\usepackage{amsfonts} % allows for drawing the expected value character
\usepackage{amsthm}
\usetikzlibrary{arrows.meta} % allows for drawing edges with circles
\usepackage{algorithm}
\usepackage{algpseudocode}
\usepackage{array}
\usepackage{multirow}

% Operators for drawing in-text edges for PAGs, DAGs, ADMGs, CGs etc
\DeclareMathOperator{\circlearrow}{\hbox{$\circ$}\kern-1.5pt\hbox{$\rightarrow$}}
\DeclareMathOperator{\circlecircle}{\hbox{$\circ$}\kern-1.2pt\hbox{$--$}\kern-1.5pt\hbox{$\circ$}}
\DeclareMathOperator{\diedgeright}{\textcolor{blue}{\boldsymbol{\rightarrow}}}
\DeclareMathOperator{\diedgeleft}{\textcolor{blue}{\boldsymbol{\leftarrow}}}
\DeclareMathOperator{\biedge}{\textcolor{red}{\boldsymbol{\leftrightarrow}}}
\DeclareMathOperator{\udedge}{\textcolor{brown}{\boldsymbol{\textendash}}}

% Operators for relations in graphs
\DeclareMathOperator{\an}{an}
\DeclareMathOperator{\pa}{pa}
\DeclareMathOperator{\ch}{ch}
\DeclareMathOperator{\pre}{pre}
\DeclareMathOperator{\de}{de}
\DeclareMathOperator{\nd}{nd}
\DeclareMathOperator{\sib}{sib}
\DeclareMathOperator{\dis}{dis}
\DeclareMathOperator{\mb}{mb}

% Operator for ``do''
\DeclareMathOperator{\doo}{do}

% Operator for odds ratio
\DeclareMathOperator{\odds}{\text{OR}}

% Operators for optimization problems
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}

% Operators for independence, expectation and calligraphy G
\def\ci{\perp\!\!\!\perp}
\def\nci{\not\!\perp\!\!\!\perp}
\newcommand{\E}{\mathbb{E}}
\newcommand{\G}{\mathcal{G}}
\newcommand{\jacob}{\textcolor{orange}}
\newcommand{\rohit}{\textcolor{purple}}

\newtheorem{theorem}{Theorem}
\newtheorem{assumption}{Assumption}
\newtheorem{lemma}{Lemma}
\newtheorem{corollary}{Corollary}
\newtheorem{definition}{Definition}

\usepackage{xr}
\makeatletter

\newcommand*{\addFileDependency}[1]{% argument=file name and extension
\typeout{(#1)}% latexmk will find this if $recorder=0
% however, in that case, it will ignore #1 if it is a .aux or 
% .pdf file etc and it exists! If it doesn't exist, it will appear 
% in the list of dependents regardless)
%
% Write the following if you want it to appear in \listfiles 
% --- although not really necessary and latexmk doesn't use this
%
\@addtofilelist{#1}
%
% latexmk will find this message if #1 doesn't exist (yet)
\IfFileExists{#1}{}{\typeout{No file #1.}}
}\makeatother

\newcommand*{\myexternaldocument}[1]{%
\externaldocument{#1}%
\addFileDependency{#1.tex}%
\addFileDependency{#1.aux}%
}
%------------End of helper code--------------

\myexternaldocument{chen_476}

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

\title{Causal Inference With Outcome-Dependent Missingness And Self-Censoring\\(Supplementary Material)}



% The standard author block has changed for UAI 2023 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1]{\href{mailto:<jmc8@williams.edu>?Subject=Your UAI 2023 paper}{Jacob M. Chen}}
\author[2]{\href{mailto:<d.malinsky@columbia.edu>?Subject=Your UAI 2023 paper}{Daniel Malinsky}}
\author[1]{\href{mailto:<rb17@williams.edu>?Subject=Your UAI 2023 paper}{Rohit Bhattacharya}}
% Add affiliations after the authors
\affil[1]{%
    Department of Computer Science \\
    Williams College
}
\affil[2]{%
    Department of Biostatistics \\
    Columbia University
}
  
\begin{document}
  
\onecolumn %% Turn this off if single column is desired for the supplement
\maketitle

\appendix
\section{Specifics of Data Generating Process}

For our simulations, we generate data according to the graph shown in Figure \ref{fig:experiment_graph} and modifications of it that violate the shadow variable or backdoor conditions. We generate the pre-treatment covariates ${\bf W}$ from a multivariate normal distribution with mean {\bf 0} and covariance matrix $\mathbf{\Sigma} =$
%
\begin{align*}
\begin{bmatrix}
    1.2 & 0 & 0 & 0\\
    0 & 1 & 0.4 & 0.4\\
    0 & 0.4 & 1 & 0.3\\
    0 & 0.4 & 0.3 & 1
    \end{bmatrix}.
\end{align*}
%
The above data generating process is equivalent to a structural equation model with correlated errors due to unmeasured confounders between the pairs $(W_2, W_3)$, $(W_2, W_4)$, and $(W_3, W_4)$.   We generate $A, Y^{(1)}, I$, and $R_Y$ according to structural equation models following edges in Figure \ref{fig:experiment_graph}. Note that we also clip all probabilities to be between the ranges of $0.01$ and $0.99$. %To generate $R_Y$, we use the odds ratio factorization and parameterization specified in section \ref{sec:estimation}. The incentive variable $I$ is normally distributed with a mean of $0$ and variance $2$.

We generate $A$ as a binary variable with the following probabilities:
%
\begin{align*}
    p(A=1 \mid W_1, W_2, W_3, W_4) &= \text{expit}(0.52 + 2*W_1 + 2*W_2 + 2*W_3 + 2*W_4) \\
    p(A=0 \mid W_1, W_2, W_3, W_4) &= 1-p(A=1 \mid W_1, W_2, W_3, W_4)
\end{align*}
%
Next, $Y^{(1)}$ is generated similarly with the following probabilities:
%
\begin{align*}
    p(Y^{(1)}=1 \mid A, W_2, W_3, W_4) &= \text{expit}(3*A + 2*W_2 + 2*W_3 + 2*W_4) \\
    p(Y^{(1)}=0 \mid A, W_2, W_3, W_4) &= 1-p(Y^{(1)}=1 \mid A, W_2, W_3, W_4)
\end{align*}
%
The variable $I$ is simply a random normal variable with mean $0$ and variance $2$, i.e. $I \sim \mathcal{N}(0, 2)$.

We use the odds ratio parameterization to generate $R_Y$ with the following two probabilities. We first specify $p(R_Y=1 \mid Y^{(1)}=0, {\bf W} \setminus \{W_1\}, I)$, which represents the probability of $R_Y=1$ when $Y^{(1)}$ is at its chosen reference value of $0$. We then use that probability to generate $p(R_Y=1 \mid Y^{(1)}, {\bf W} \setminus \{W_1\}, I)$ at all values of $Y^{(1)}$.
%
\begin{align*}
    p(R_Y=1 \mid Y^{(1)}=0, {\bf W} \setminus \{W_1\}, I) &= \text{expit}(W_2 + W_3 + W_4 + 0.5*I)
\end{align*}
\begin{align*}
    p(R_Y&=1 \mid Y^{(1)}, {\bf W} \setminus \{W_1\}, I) = \\
    &\frac{p(R_Y=1 \mid Y^{(1)}=0, {\bf W} \setminus \{W_1\}, I)}{p(R_Y=1 \mid Y^{(1)}=0, {\bf W} \setminus \{W_1\}, I) + \text{exp}(-1.5*Y^{(1)}) \times (1-p(R_Y=1 \mid Y^{(1)}=0, {\bf W} \setminus \{W_1\}, I))}.
\end{align*}

In the case where we add $A \diedgeright R_Y$ to Figure~\ref{fig:experiment_graph}, we add $1.5*A$ in the expit function for $p(R_Y=1 \mid Y^{(1)}=0, \cdot)$.

\clearpage

\section{Proof of Theorem \ref{thm:identification}}

We first note that the presence of $R_Y$ in the numerator ensures that we only use observed rows of data. Further, the propensity scores in the denominator are identified: $p(A\mid {\bf Z})$ only depends on observed quantities, and $p(R_Y=1\mid A, Y^{(1)}, {\bf Z}) = p(R_Y=1\mid Y^{(1)}, {\bf Z})$ is identified using S1, S2, and the completeness condition. We now prove that the proposed identifying functional is equal to the backdoor adjustment functional and counterfactual mean under the full data law.

\begin{proof}
%
\begin{align*}
    & \E\bigg[\frac{R_Y \times \mathbb{I}(A=a) \times Y}{p(R_Y=1 \mid Y^{(1)}, {\bf Z}) \times p(A=a \mid {\bf Z})} \bigg] \\
    &=^{(1)} \sum_{R_Y, Y^{(1)}, A, {\bf Z}, Y} p(R_Y, Y^{(1)}, A, {\bf Z}, Y) \times \frac{R_Y \times \mathbb{I}(A=a) \times Y}{p(R_Y=1 \mid Y^{(1)}, {\bf Z}) \times p(A=a \mid {\bf Z})} \\
    &=^{(2)} \sum_{Y^{(1)}, A, {\bf Z}} p(R_Y=1, Y^{(1)}, A, {\bf Z}) \times \frac{\mathbb{I}(A=a) \times Y^{(1)}}{p(R_Y=1 \mid Y^{(1)}, {\bf Z}) \times p(A \mid {\bf Z})} \\
    &=^{(3)} \sum_{Y^{(1)}, A, {\bf Z}} p(R_Y=1 \mid Y^{(1)}, A, {\bf Z}) p(Y^{(1)} \mid A, {\bf Z}) p(A \mid {\bf Z}) p({\bf Z}) \times \frac{\mathbb{I}(A=a) \times Y^{(1)}}{p(R_Y=1 \mid Y^{(1)}, {\bf Z}) \times p(A=a \mid {\bf Z})} \\
    &=^{(4)} \sum_{Y^{(1)}, {\bf Z}} p(R_Y=1 \mid Y^{(1)}, {\bf Z}) p(Y^{(1)} \mid A=a, {\bf Z}) p(A=a \mid {\bf Z}) p({\bf Z}) \times \frac{Y^{(1)}}{p(R_Y=1 \mid Y^{(1)}, {\bf Z}) \times p(A=a \mid {\bf Z})} \\
    &=^{(5)} \sum_{Y^{(1)}, {\bf Z}} p(Y^{(1)} \mid A=a, {\bf Z}) \times p({\bf Z}) \times Y^{(1)} \\
    &=^{(6)} \sum_{{\bf Z}} \E[Y^{(1)} \mid A=a, {\bf Z}] \times p({\bf Z}) =^{(7)} \E[Y^{(a, 1)}].
\end{align*}
%
In (1) we apply the law of the unconscious statistician; in (2) we evaluate the sum over $R_Y$ and use missing data consistency; in (3) we apply the chain rule of probability; in (4) we evaluate the sum over $A$ and drop $A$ from the propensity score of $R_Y$ due to condition S2; (5) follows from cancellation of common terms in the numerator and denominator; (6) follows from definition of expectation; the last step (7) follows from the fact that ${\bf Z}$ satisfies the backdoor conditions B1 and B2.
\end{proof}

\section{Proof of Equation \ref{eq:or_factorization}}

For completeness, we provide a proof for odds ratio parameterization of the propensity score in \eqref{eq:or_factorization}. First, from \cite{chen2007semiparametric} we have an odds ratio factorization of the joint distribution $p(R_Y, Y^{(1)} \mid {\bf Z})$ as follows,
%
\begin{align}
    p(R_Y, Y^{(1)} \mid {\bf Z}) &= \frac{p(R_Y \mid Y^{(1)}=y_0, {\bf Z}) \times p(Y^{(1)} \mid R_Y=1, {\bf Z}) \times \odds(Y^{(1)}, R_Y \mid {\bf Z})}{\sum_{R_Y, Y^{(1)}} p(R_Y \mid Y^{(1)}=y_0, {\bf Z}) \times p(Y^{(1)} \mid R_Y=1, {\bf Z}) \times \odds(Y^{(1)}, R_Y \mid {\bf Z})},
    \label{eq:chen_or_factorization}
\end{align}
%
where $y_0$ is a reference value for $Y^{(1)}$ and $1$ is the reference value for $R_Y$, and the denominator of is a normalizing function. Let $\psi = p(R_Y \mid Y^{(1)}=y_0, {\bf Z}) \times p(Y^{(1)} \mid R_Y=1, {\bf Z}) \times \odds(Y^{(1)}, R_Y \mid {\bf Z})$ and $\psi_1 = \left. \psi \right|_{R_Y=1}$. As in the main section of the paper, $\pi_0 \coloneqq p(R_Y =1 \mid Y^{(1)} = y_0, {\bf Z})$ and $\eta(Y^{(1)}, {\bf Z}) \coloneqq \odds(R_Y=0, Y^{(1)} \mid {\bf Z})$. We present the proof and an explanation of each step below.

(1) and (2) follow from standard laws of probability.
In (3), we apply the odds ratio factorization in \eqref{eq:chen_or_factorization} to both the numerator and the denominator. In (4), we cancel out like terms in both the numerator and denominator. In (5), we simply expand out $\psi_1$ and $\psi$ according to our previous definitions of these two terms. In (6), we note that $\odds(Y^{(1)}, R_Y=1 \mid {\bf Z})$ has $R_Y$ at its reference value of $1$; hence, it is equal to $1$. Further, we move the term $p(Y^{(1)} \mid R_Y=1, {\bf Z})$ outside of the sum in the denominator because this term is not a function of $R_Y$. In (7), we cancel out like terms from the numerator and denominator. Finally, in (8), we explicitly write out the sum over $R_Y$, which has only two possible values. When $R_Y=1$, $R_Y$ is at its reference value in the odds ratio, so the odds ratio term disappears, and we are just left with $\pi_0({\bf Z})$. When $R_Y=0$, we know that $p(R_Y=0 \mid Y^{(1)}=y_0, {\bf Z}) = 1-\pi_0({\bf Z})$ and that neither $Y^{(1)}$ nor $R_Y$ are at their reference values in the odds ratio term. Therefore, the odds ratio term remains. 

%
\begin{proof}
    \begin{align*}
        p(R_Y=1 \mid Y^{(1)}, {\bf Z}) &=^{(1)} \frac{p(R_Y=1, Y^{(1)} \mid {\bf Z})}{p(Y^{(1)} \mid {\bf Z})} \\
        &=^{(2)} \frac{p(R_Y=1, Y^{(1)} \mid {\bf Z})}{\sum_{R_Y} p(R_Y, Y^{(1)} \mid {\bf Z})} \\
        &=^{(3)} \frac{\frac{\psi_1}{\sum_{R_Y, Y^{(1)}} \psi}}{\frac{\sum_{R_Y} \psi}{\sum_{R_Y, Y^{(1)}} \psi}} \\
        &=^{(4)} \frac{\psi_1}{\sum_{R_Y} \psi} \\
        &=^{(5)} \frac{p(R_Y=1 \mid Y^{(1)}=y_0, {\bf Z}) \times p(Y^{(1)} \mid R_Y=1, {\bf Z}) \times \odds(Y^{(1)}, R_Y=1 \mid {\bf Z})}{\sum_{R_Y} p(R_Y \mid Y^{(1)}=y_0, {\bf Z}) \times p(Y^{(1)} \mid R_Y=1, {\bf Z}) \times \odds(Y^{(1)}, R_Y \mid {\bf Z})} \\
        &=^{(6)} \frac{p(R_Y=1 \mid Y^{(1)}=y_0, {\bf Z}) \times p(Y^{(1)} \mid R_Y=1, {\bf Z})}{p(Y^{(1)} \mid R_Y=1, {\bf Z}) \times \sum_{R_Y} p(R_Y \mid Y^{(1)}=y_0, {\bf Z}) \times \odds(Y^{(1)}, R_Y \mid {\bf Z})} \\
        &=^{(7)} \frac{p(R_Y=1 \mid Y^{(1)}=y_0, {\bf Z})}{\sum_{R_Y} p(R_Y \mid Y^{(1)}=y_0, {\bf Z}) \times \odds(Y^{(1)}, R_Y \mid {\bf Z})} \\
        &=^{(8)} \frac{\pi_0({\bf Z})}{\pi_0({\bf Z}) + \eta(Y^{(1)}, {\bf Z})(1-\pi_0({\bf Z}))}
    \end{align*}
\end{proof}


\section{Additional Simulation Results}

We report additional simulation results for simulations of the search algorithm described in section \ref{sec:experiments}. Table \ref{tab:covariate_search_pval0.01} gives results when using $\alpha=0.01$ to conclude dependence between two variables.
%
\begin{table}[ht]
    \centering
    \begin{tabular}{|c|c|c|}
    \hline
    \begin{tabular}[c]{@{}c@{}}{\bf Sample} {\bf Size}\end{tabular} & \multicolumn{1}{l|}{\bf Sensitivity} & {\bf Specificity} \\ \hline
    500   & 0.0 & 0.394 \\ \hline
    2500  & 0.269 & 0.383 \\ \hline
    5000  & 0.690 & 0.717 \\ \hline
    10000 & 0.828 & 0.952 \\ \hline
    \end{tabular}
    \caption{Tables showing the accuracy of tests for different sample sizes and $\alpha=0.01$.}
    \label{tab:covariate_search_pval0.01}
\end{table}

Next, Table \ref{tab:covariate_search_pval0.1} gives the results of the simulations when using $\alpha=0.1$ to conclude dependence between two variables.

\begin{table}[ht]
    \centering
    \begin{tabular}{|c|c|c|}
    \hline
    \begin{tabular}[c]{@{}c@{}}{\bf Sample} {\bf Size}\end{tabular} & \multicolumn{1}{l|}{\bf Sensitivity} & {\bf Specificity} \\ \hline
    500   & 0.022 & 0.363 \\ \hline
    2500  & 0.671 & 0.630 \\ \hline
    5000  & 0.837 & 0.770 \\ \hline
    10000 & 0.949 & 0.857 \\ \hline
    \end{tabular}
    \caption{Tables showing the accuracy of tests for different sample sizes and $\alpha=0.1$.}
    \label{tab:covariate_search_pval0.1}
\end{table}

As the p-value increases, the accuracy of the tests for correctly predicting an adjustment set when one is possible increases. On the other hand, the accuracy of the tests for correctly identifying that there is no possible adjustment set when no such set exists decreases as p-value increases. For all p-values, the accuracy of the tests in general increase as the sample size increases.

\begin{figure}[ht]
    \centering
    \includegraphics[scale=0.6]{graphs/sample_size500.png}
    \caption{Estimation results for sample size 500.}
    \label{fig:estimation_500}
\end{figure}

Next, we report additional simulation results for estimation of the causal effect described in Section~\ref{sec:experiments}. Figure~\ref{fig:estimation_500} shows the estimation results for sample size 500. When using the correct adjustment method, our practical estimation method is able to accurately recover the causal effect. However, using the full pipeline of our method, the estimates are fairly inaccurate. This is to be expected, as the sensitivity of the covariate search at a small sample size is quite inaccurate regardless of p-value. In addition, due to missing data, the effective sample size of the data is roughly $300$.

\bibliography{chen_476}

\end{document}
