\newpage
\appendix

\newcommand{\M}{\mathcal{M}}
\newcommand{\cR}{\mathcal{R}}
\newcommand{\cS}{\mathcal{S}}
\newcommand{\cT}{\mathcal{T}}
\newcommand{\cU}{\mathcal{U}}
\newcommand{\cV}{\mathcal{V}}
\newcommand{\cW}{\mathcal{W}}
\newcommand{\Ind}{\mathbb{I}}
\newcommand{\Var}{{\operatorname{Var}}}

\section{Related Work}

\paragraph{Network Interference}
Network interference is a well studied topic in causal inference literature, with a variety of methods proposed for the problem. Existing works in this area incorporate various sets of assumptions to provide an estimate of treatment effects. A common approach is the exposure mapping framework which allows defines a degree of "belonging" of a unit to either the treatment or control group \citep{AronowSamii17, auerbach2021local, li2021causal, viviano2020experimental}. Typically linearity with respect to neighbouring treatments is also assumed \citep{EcklesKarrerUgander17, leung2022causal,zhang23a,wang2017efficient} but is not neccessary \citep{sussman2017elements}.
%A common assumption is that the network effect is linear with respect the neighbour treatments. 
A limitation of these approaches is that they require complete knowledge of the network structure. 
%While our approach also relies on imposing an exposure-based structure to the form of interference, however \emph{we work with an incomplete knowledge of the network}.


%These models are also related to dose-response literature, as the primary role of the exposure function is to provide a summary statistic which encapsulates all necessary information of the effective treatment.




%Another common assumption is the heterogeneous linear interference model \citep{EcklesKarrerUgander17,SussmanAiroldi17}, which posits an additive effect of neighbourhood treatments. 

%Exposure assumptions reduce the number of unknown parameters in the model to a fixed dimension that does not grow with the population size, reducing the inference task to model fitting task. As a result, the natural solution is to use a least squares regression, shifting the focus to constructing randomized designs that minimize the variance of the estimate. A limitation of this approach is that one needs the knowledge of the network structure to compute the correct neighborhood statistics. 

Treatment effect estimation with unknown network interference has  been studied beginning with the seminal work of \citet{hudgens_halloran08}. Other works such as \citet{auerbach2021local,pmlr-v115-bhattacharya20a,LiuHudgens14,TchetgenVanderWeele12,VanderweeleTchetgenHalloran14} have extended this idea further. Often the bias of these estimators depends on the the number of edges between the clusters, but constructing good clusters is also known to be computationally intractable\citep{pouget2018dealing}. This  has led to  development of various heuristic methods for constructing clusters~\citep{EcklesKarrerUgander17, GuiXuBhasinHan15}. However, this still requires the graph to be static and not treatment dependent. On the other hand, \emph{our method can handle treatment dependence in general unstructured graphs}. 
%Finally, there are methods, which under restrictive assumptions, use SUTVA based estimates for one-sided hypothesis tests for treatment effect under interference \citep{choi2017estimation,athey2019estimating,lazzati2015treatment}.



\paragraph{Estimation with Misspecifications and Mismeasurements}

The estimation of treatment effects in the presence of model misspecification is an important problem in causal inference, with numerous methods and heuristics proposed to address this challenge \citep{carroll2006measurement,ogburn2013bias,lockwood2016matching}. A comprehensive overview on this problem can be found in \citet{yi2021handbook,vansteelandt2012model}. Various approaches have been proposed towards handling misspecification in noise model \cite{dukes2021inference}, propensity weights \citep{kreif2016evaluating}, confounders \cite{pearl2012measurement,schuster2023misspecification}, and mediators \cite{valeri2014estimation,dukes2023proximal}.


A related problem to misspecifed models is noisy measurements. In general access to noisy variables is not sufficient to identify causal effects \citep{kuroki2014measurement,hernan2020causal}. Some research in solving this problem \citep{dukes2023proximal,cui2024semiparametric} uses ideas from proximal causal inference  \citep{tchetgen2020introduction}. However these require knowledge of multiple . A different approach has been to focus on bounding for treatment effects rather than estimate them precisely. This line of work includes methods for sensitivity analysis \citep{imbens2003sensitivity,veitch2020sense,dorie2016flexible} and partial identification under various assumptions \citep{zhao2017sensitivity,yadlowsky2018bounds,zhang2020bounding,yin2021conformal,guo2022partial}.  Similar analysis for missing data has been conducted for missing mediators \citep{li2017identifiability} and outcomes \citep{cornelisz2020addressing}

Existing methods for causal effect estimation under imprecise networks often require additional information to mitigate bias. For example, some approaches leverage repeated measurements to reduce the impact of noise \citep{shankar22cookie,YuCortezEichhorn22}, while others rely on a gold standard sample of measurements to calibrate or correct noisy data \citep{shankar23diet}. \textit{These strategies however do not apply when the networks are treatment dependent. This is because compared to these earlier works, the noise acts as an endogenous variable, which needs specialized techniques.}
%attention highlight the importance of supplementary data sources or validation samples in improving the accuracy of causal inference in the presence of measurement error. By incorporating such information, researchers can develop more reliable estimates of treatment effects, even when the primary data is subject to noise.


%Parameter estimation with measurement noise is a well studied problem in causal inference~\citep{wickens1972note,frost1979proxy}.
%Many methods and heuristics have been proposed for estimation of treatment effect~\citep{carroll2006measurement,schennach2016recent,ogburn2013bias,lockwood2016matching} with measurement noise in data. \citet{yi2021handbook} provides an overview of recent literature on the bias introduced by measurement error on causal estimation. Earlier works have focused on qualitative analysis by encoding assumptions of the error mechanism into a causal graph \cite{hernan2020causal}, outcome \cite{shu2019causal}, confounders \cite{pearl2012measurement, miles2018class} and mediators \cite{valeri2014estimation}.

%Noisy covariates or proxy variables are  in general not sufficient to identify causal effects \citep{kuroki2014measurement}. As such works some works have considered partial identification of treatment effects \citep{zhao2017sensitivity,yadlowsky2018bounds,zhang2020bounding,yin2021conformal,guo2022partial} and sensitivity analysis \citep{imbens2003sensitivity,veitch2020sense,dorie2016flexible}. 
%Existing methods for estimating causal effects under noise rely upon additional information such as repeated measurements \citep{shankar22cookie,YuCortezEichhorn22} or a gold standard sample of measurements \citep{shankar23diet}.

%such as \citet{kuroki2014measurement,miao2018identifying,shpitser2021proximal,dukes2021proximal,ying2021proximal,guo2022partial} have focused on identifying criteria for treatment effect estimation with noisy measurements with confounding variables.
%Methods based on assuming knowledge of the error model are also common \citep{gustafson2003measurement,shpitser2021proximal,fang2023predictors}. Existing methods for estimating causal effects under noise rely upon additional information such as repeated measurements \citep{shankar22cookie,YuCortezEichhorn22}, instrumental variables \citep{zhu2022causal,tchetgen2020introduction} or a gold standard sample of measurements \citep{shankar23diet}.
%While few works have also tried to study causal inference with measurement errors and no side information \cite{miles2018class,pollanen2023identifiable}, these works focus on noisy measurements of unknown confounders or covariates, \emph{whereas our focus is on uncertain network interference}. Finally, some works have considered partial identification of treatment effects \citep{zhao2017sensitivity,yadlowsky2018bounds,zhang2020bounding,yin2021conformal,guo2022partial} and sensitivity analysis \citep{imbens2003sensitivity,veitch2020sense,dorie2016flexible}. 

%These models make strong assumptions in order to ensure the identifiability of the treatment effect. Other works have focused on partial identification of treatment effects \citep{zhao2017sensitivity,yadlowsky2018bounds,zhang2020bounding,yin2021conformal,guo2022partial}, sensitivity analysis \citep{liu2013introduction,richardson2014nonparametric,imbens2003sensitivity,veitch2020sense,dorie2016flexible} or on trying to identify confounders~\citep{ranganath2018multiple,d2019multi,wang2019blessings,miao2020identifying}.
%\citet{imai2010causal, yadlowsky2018bounds} propose a constrained optimization based approach that quantifies bounds on the treatment effects.




% Noisy covariates or proxy variables are not generally sufficient to identify causal effects as they violate the ``no unobserved confounders'' assumption. 
% 

%proposed cluster randomized designs that randomize over clusters that are constructed to minimize the number of edges between clusters \citep{EcklesKarrerUgander17, GuiXuBhasinHan15}. Constructing good clusters is computationally intensive \citep{abadi2020}

%\end{comment}



\paragraph {Inverse Propensity/Horvitz-Thompson Estimate}
If the graph is known and when all treatment decisions independently set with probability p, one can use the  classic Horvitz Thompson estimator (or inverse propensity estimator) as:

$$
\tau_{HT} =  \frac{1}{n} \sum_i Y_i \left(  \frac{\prod_{j \in \mathcal{N}_i} z_j}{ \prod_{j \in \mathcal{N}_i } p} - \frac{\prod_{j \in \mathcal{N}_i} (1-z_j)}{ \prod_{j \in \mathcal{N}_i } (1-p)} \right) 
$$
 

A similar formula exists for the Hajek style estimator with the denominators $\prod_{j \in \mathcal{N}_i } p$ and $\prod_{j \in \mathcal{N}_i } (1-p)$, replaced by their self normalized values.
This estimator filters out any units for which all neighbours are not in control or treatment groups, and is not be meaningful, when there do not not exist units for which all the neighbours are in control or treatment groups. For example, with a k-regular interference graph with $k=20$ and $p=0.5$, we need around a million nodes for the HT estimate to even have a meaningful value.


However even when HT estimates provide reasonable values, they do not work with dynamic or treatment dependent graphs.

%However, computing these requires the entire graph to be known.
%These estimator only considers those units for which all the neighbours are in control or treatment groups.  Since the unbiasedness of these estimators doesn't rely on correct exposure specification, these methods remain unbiased if instead of the exact neighbourhood one uses a superset of neighbours. However, as this increases the size of neighbourhood, this further adds to the variance and existence issues of these estimators.

\appendix

\setcounter{theorem}{0}
\setcounter{thm}{0}
\onecolumn


%\newcommand{\cR}{\mathcal{R}}
%\newcommand{\cS}{\mathcal{S}}
%\newcommand{\cT}{\mathcal{T}}
%\newcommand{\cU}{\mathcal{U}}
%\newcommand{\cV}{\mathcal{V}}
%\newcommand{\cW}{\mathcal{W}}
%\newcommand{\Ind}{\mathbb{I}}

% I \textit{think} the WIS/self-normalized estimator would be, let $\rho_j=z_j/p$ and let $\rho'_j = (1-z_j)/(1-p)$,
% \begin{align}
%     \frac{1}{n}\sum_i Y_i
%     \left( \sum_{k\in \mathcal{N}_i} \frac{\rho_k}{\frac{1}{n}\sum_j \frac{1}{|N_j|}\sum_{k\in N_j} \rho_k} -  \frac{\rho'_k}{\frac{1}{n}\sum_j \frac{1}{|N_j|}\sum_{k\in N_j} \rho'_k}\right)
% \end{align}
% as in the limit $\frac{1}{n}\sum_j \frac{1}{|N_j|} \sum_{k\in N_j} \rho_k = \frac{1}{n}\sum_j \frac{1}{|N_j|}\sum_{k\in N_j} \rho'_k = 1$


\section{Proofs}


\begin{lemma} \label{lem:exp_prod}
    Suppose that $\{z_i\}_{i=1..n}$ are mutually independent, with $z_i \sim \text{Bernoulli}(p)$. Then, for any set of indices $S, S' \subset[n]$, and stochastic function $f$ we have
    \[
        \E \Big[ \prod_{i \in S} \Big( \frac{z_i}{p} - \frac{1-z_i}{1-p} \Big) \prod_{j \in S'} f(z_{j}) \Big] =
        \begin{cases}
        (\E[f(1)]-\E[f(0)])^{|S\cap S'|} \E[f(z)]^{|S'\setminus S|} & \text{if } S \subseteq S'\\    
        0 & \text{otherwise}\\
        \end{cases}
    \]
\end{lemma}

% \begin{proof}
% Fix $S, S'$. A given index (node) $i$ can either be only in $S$ or only in $S'$ or in both, with only one of the possibilities being true. Correspondingly the product, $\prod_{i \in S} \Big(  \frac{z_i}{p} - \frac{1-z_i}{1-p} \Big) \prod_{j \in S'} z_{j} $ can be factored into three different products:

% $$
%  \prod_{i \in S} \Big( \frac{z_i}{p} - \frac{1-z_i}{1-p} \Big) \prod_{j \in S'} z_{j}  =  \prod_{i \in S \setminus S'} \Big( \frac{z_i}{p} - \frac{1-z_i}{1-p} \Big)  \prod_{k \in S \cap S'}  z_k \Big(\frac{z_k}{p} - \frac{1-z_k}{1-p} \Big) \prod_{j \in S' \setminus S}  z_j
% $$
% Applying expectations and noting that  $z_i$ are mutually independent, we get:
% $$
% \prod_{i \in S \setminus S'} \Big( \E \big[ \frac{z_i}{p} - \frac{1-z_i}{1-p} \big] \Big)  \prod_{k \in S \cap S'} \E \big[ z_k \Big(\frac{z_k}{p} - \frac{1-z_k}{1-p} \Big)\big] \prod_{j \in S' \setminus S}  \E [z_j] = \prod_{i \in S \setminus S'} (1 - 1)\prod_{k \in S \cap S'} 1 \prod_{j \in S' \setminus S} p
% $$
% The RHS can only be non zero if $S \setminus S' = \{\}$ i.e. $S \subseteq S'$, and in that case is equal to $\prod_{j \in S' \setminus S}  = p^{|S' \setminus S|}$
% \end{proof}




\begin{proof}
Fix $S, S'$. A given index (node) $i$ can either be only in $S$ or only in $S'$ or in both, with only one of the possibilities being true. Correspondingly the product, $\prod_{i \in S} \Big(  \frac{z_i}{p} - \frac{1-z_i}{1-p} \Big) \prod_{j \in S'} f(z_{j}) $ can be factored into three exclusive products:

$$
 \prod_{i \in S} \Big( \frac{z_i}{p} - \frac{1-z_i}{1-p} \Big) \prod_{j \in S'} f(z_{j})  =  \prod_{i \in S \setminus S'} \Big( \frac{z_i}{p} - \frac{1-z_i}{1-p} \Big)  \prod_{k \in S \cap S'}  f(z_k) \Big(\frac{z_k}{p} - \frac{1-z_k}{1-p} \Big) \prod_{j \in S' \setminus S}  f(z_j)
$$
Applying expectations and noting that  $z_i$ are mutually independent, we get:
{\small
$$
\prod_{i \in S \setminus S'}  \E \big[ \frac{z_i}{p} - \frac{1-z_i}{1-p} \big]  \prod_{k \in S \cap S'} \E \big[ f(z_k )\Big(\frac{z_k}{p} - \frac{1-z_k}{1-p} \Big)\big] \prod_{j \in S' \setminus S}  \E f(z_j) = \prod_{i \in S \setminus S'} 0\prod_{k \in S \cap S'} \frac{\E[z_kf(z_k)] -p\E[f(z_k)]}{p(1-p)} \prod_{j \in S' \setminus S} \E[f(z_j)]
$$
}%
The RHS can only be non zero if $S \setminus S' = \{\}$ i.e. $S \subseteq S'$.

Since $\E\left[f(z_k)\left(\frac{z_k}{p} - \frac{1-z_k}{1-p} \right)\right] = p*\E[f(1)]*\frac{1}{p} + (1-p)*\E[f(0)]*(\frac{-1}{1-p}) = \E[f(1)] - \E[f(0)]$; the RHS when it is non zero simplifies to
$$(\E[f(1)]-\E[f(0)])^{|S\cap S'|} \E[f(z)]^{|S'\setminus S|}$$
\end{proof}

\begin{corollary}
\label{lem:corr_id}
    By putting $f(z) =z$ in Lemma \ref{lem:exp_prod}we get
     \[
        \E \Big[ \prod_{i \in S} \Big( \frac{z_i}{p} - \frac{1-z_i}{1-p} \Big) \prod_{j \in S'} z_{j} \Big] =
        \begin{cases}
        p^{|S' \setminus S|} & \text{if } S \subseteq S'\\    
        0 & \text{otherwise}\\
        \end{cases}
    \]
\end{corollary}


\begin{lemma} \label{lem:help_beta}
    Suppose that $\{z_i\}_{i=1..n}$ are mutually independent, with $z_j \sim \text{Bernoulli}(p)$. Then, for any subsets $S,  S'$,
    $\E[\prod_{i \in S} f_i(z_i) \prod_{j \in \S'} \frac{z_j -p}{p}] = \prod_{i \in S \setminus S'} \E[f_i(z_i)]  \prod_{k \in S \cap S'}\left((1-p)(\E[f_k|z_k=1] - \E[f_k|z_k=0])\right) \Ind[S' \subseteq S] $
\end{lemma}

\begin{proof}
    Fix $S, S'$. A given index (node) $i$ can either be only in $S$ or only in $S'$ or in both, with only one of the possibilities being true. Correspondingly the product, $\prod_{i \in S} f_i(z_i) \prod_{j \in \S'} \frac{z_j -p}{p} $ can be factored into three exclusive products:

\begin{align*}
\E[\prod_{i \in S} f_i(z_i) \prod_{j \in \S'} \frac{z_j -p}{p}] &= \E[\prod_{i \in S \setminus S'} f_i(z_i)  \prod_{k \in S \cap S'} f_k(z_k)\frac{z_k -p}{p} \prod_{j \in S' \setminus S} \frac{z_j -p}{p}] \\
&= \prod_{i \in S \setminus S'} \E[f_i(z_i)]  \prod_{k \in S \cap S'}\E[f_k(z_k)\frac{z_k -p}{p}]\prod_{j \in S' \setminus S} \E[\frac{z_j -p}{p}] \\
%&= \prod_{i \in S \setminus S'} \E[f(z_i)]  \prod_{k \in S \cap S'}\E[f(z_k)\frac{z_k -p}{p}]\prod_{j \in S' \setminus S} \E[\frac{z_j -p}{p}] \\
&= \prod_{i \in S \setminus S'} \E[f_i(z_i)]  \prod_{k \in S \cap S'}\left((1-p)(\E[f_k|z_k=1] - \E[f_k|z_k=0])\right)\prod_{j \in S' \setminus S} 0 \\
&= \prod_{i \in S \setminus S'} \E[f_i(z_i)]  \prod_{k \in S \cap S'}\left((1-p)(\E[f_k|z_k=1] - \E[f_k|z_k=0])\right) \Ind[S' \subseteq S]
\end{align*}
The exact same argument can be applied to 
$\E[\prod_{i \in S} z_i \prod_{j \in S'} \frac{p - z_j}{1-p}]$
\end{proof}

\begin{lemma} \label{lem:help_beta2}
For any sets $S',\M_i$ such that $S' \subseteq \N_i \subseteq \M_i$ and $|S'| \leq \beta$ and stochastic functions $f_i$
$$
\E\left[ \prod_{ k \in S'} z_kf(z_k) \sum_{\substack{S \subseteq \M_i \\ |S| \leq \beta}} \Big( \prod_{j \in S} \frac{z_j-p}{p} - \prod_{j \in S} \frac{p - z_j}{1-p} \Big)\right]
= \prod_{i \in S'} \E[f_i(1)]
$$
\end{lemma}

\begin{proof}
    
\begin{align*}
\E\left[ \prod_{ k \in S'} z_kf(z_k) \sum_{\substack{S \subseteq \M_i \\ |S| \leq \beta}} \Big( \prod_{j \in S} \frac{z_j-p}{p} - \prod_{j \in S} \frac{p - z_j}{1-p} \Big)\right]
&=  \sum_{\substack{S \subseteq \M_i \\ |S| \leq \beta}} \E\left[\left( \prod_{ k \in S'} z_kf(z_k) \prod_{j \in S} \frac{z_j-p}{p} - \prod_{ k \in S'} z_kf(z_k) \prod_{j \in S}   \frac{p - z_j}{1-p} \right)\right] \\
%&= \sum_{\substack{S \subseteq \M_i \\ |S| \leq \beta}} \left[ p^{|S'/S]} (\frac{(1-p)^2}{p})^{|S' \cap S|} \Ind [S \subseteq S'] \right]\\
\intertext{Applying Lemma \ref{lem:help_beta} with $f_i(z) = z_i$ we get}
&= \sum_{\substack{S \subseteq \M_i \\ |S| \leq \beta}} \biggl[
p^{|S'/S]} \prod_{i \in S \setminus S'} \E[f_i(1)](1-p)^{|S' \cap S|}\prod_{i \in S \cap S'} \E[f_i(1)] \Ind [S \subseteq S'] \\
&\;\;\;\;\;\;\;\;- 
p^{|S'/S]}  \prod_{i \in S \setminus S'} \E[f_i(1)] (-p)^{|S' \cap S|}\prod_{i \in S \cap S'} \E[f_i(1)] \Ind [S \subseteq S']
\biggr] \\
&\overset{(b)}{=} \prod_{i \in S'} \E[f_i(1)]\sum_{\substack{S \subseteq S' \\ |S| \leq \beta}} \left[
p^{|S'/S]} (1-p)^{|S' \cap S|}  - 
p^{|S'/S]} (-p)^{|S' \cap S|} 
\right] \tag{S1}\\
\intertext{\centering $(b)$ follows from that fact that $M_i \supseteq N_i$ for any node $i$ and $\Ind[S \subseteq S']$ will filter any non subset of $S'$}
&= \prod_{i \in S'} \E[f_i(1)]\sum_{\substack{S \subseteq S' \\ |S| \leq \beta}} p^{|S'|} \left[
 p^{-|S|} (1-p)^{|S|}  -   p^{-|S|}(-p)^{| S|} 
\right]\\
&= \prod_{i \in S'} \E[f_i(1)]\sum_{\substack{S \subseteq S' \\ |S| \leq \beta}} p^{|S'|} \left[ (\frac{1}{p} - 1)^{|S|}  -  (-1)^{| S|} 
\right] \tag{S2}
\intertext{If $|S'| \leq \beta$, the constraint of $\leq \beta$ is redundant. Then by applying binomial theorem we get.}
&\overset{}{=} p^{|S'|}\prod_{i \in S'} \E[f_i(1)] \left[ \left( 1 + (\frac{1}{p} - 1) \right)^{|S'|} - \left(1 + (-1) \right)^{|S'|} \right] =\prod_{i \in S'} \E[f_i(1)]
\end{align*}
\end{proof}

\begin{lemma}
\label{lem:pinverse}
If the set of instrumental variables $V$ is chosen such that $V_j = \frac{Z_j}{p} - \frac{1-Z_j}{1-p}$, then for the pseudo-inverse estimator ($\hat{c}$) in Equation 3, the $j^{\text{th}}$ component is given by $\hat{c}_i(j) = Y_i\frac{Z_j}{p} - \frac{1-Z_j}{1-p}$.
\end{lemma}
\begin{proof}
Note that we are setting $V_j = \frac{Z_j}{p} - \frac{1-Z_j}{1-p}$. Let $X = \E[V Z^T_{\N_i}]$.

Note that $X_{ji} = (\frac{Z_j}{p} - \frac{1-Z_j}{1-p})Z_i$. By \cref{lem:exp_prod}, we know that the the $\E[X_{ji}] = 1 \mathbb{I}[j=i]$. Thus the matrix $\E[X]$ is diagonal with 1 for every variable shared between $V$ and $Z_{\N_i}$, and 0 everywhere else. The pseudoinverse of such a matrix is the matrix itself.

The $VY_i$ component of $\hat{c}$ is $(\frac{Z_j}{p} - \frac{1-Z_j}{1-p})Y_i$. Since the pseudo-inverse of $X$ is just diagonal with 1 and 0,  with $1$ for every variable shared between $V$ and $Z$; only those components remain. 
Thus $\hat{c}_i(j) = Y_i\frac{Z_j}{p} - \frac{1-Z_j}{1-p}$ for every index $j$ shared between $V$ and $Z_{\N_i}$.
Thus the treatment effect estimate $\tau = \sum  \hat{c} = \frac{1}{n} \sum_i Y_i  \sum_{j \in V} \left(  \frac{z_j}{p} - \frac{(1-z_j)}{(1-p)}  \right)$
\end{proof}


We prove a more general result than the statement in the paper. 
\begin{thm}
Consider a additive model of the form $Y_i(\bz) = \sum_{S' \subset \N_i} c_{i,S'} \prod_{j \in S'} \Ind[z_j A_{ij} = 1]$. Here each subset of neighbours has an influence which only occurs when all those edges connect to $i$. Under such a model the GATE effect is given by $\tau = \sum_{S' \subset \N_i} c_{i,S'} \prod_{j \in S'} \E[A_{ij}|\bz = 1]$.
If $\M_i \supseteq \N_i$, then $\hat{\tau}^\beta = \frac{1}{n} \sum_i Y_i \sum_{\substack{S \subseteq \M_i \\ |S| \leq \beta}} \Big( \prod_{j \in S} \frac{z_j-p}{p} - \prod_{j \in S} \frac{p - z_j}{1-p} \Big)$ is unbiased 
\end{thm}

\begin{proof}
If $Y_i(\bz) = \sum_{S' \subset \N_i} c_{i,S'} \prod_{j \in S'} \Ind[z_j A_{ij} = 1]$ then for $\hat{\tau}^\beta$ we get

\begin{align*}
    \E[\hat{\tau}^\beta] &= \E \left[ \frac{1}{n} \sum_i Y_i \sum_{\substack{S \subseteq \M_i \\ |S| \leq \beta}} \Big( \prod_{j \in S} \frac{z_j-p}{p} - \prod_{j \in S} \frac{p - z_j}{1-p} \Big) \right] \\
    &= \E \left[ \frac{1}{n} \sum_i \sum_{S' \subset \N_i} c_{i,S'} \prod_{j \in S'} \Ind[z_j A_{ij} = 1] \sum_{\substack{S \subseteq \M_i \\ |S| \leq \beta}} \Big( \prod_{j \in S} \frac{z_j-p}{p} - \prod_{j \in S} \frac{p - z_j}{1-p} \Big) \right] \\
    &= \frac{1}{n} \sum_i \E \left[   \sum_{S' \subset \N_i} c_{i,S'} \prod_{j \in S'} z_j A_{ij}  \sum_{\substack{S \subseteq \M_i \\ |S| \leq \beta}} \Big( \prod_{j \in S} \frac{z_j-p}{p} - \prod_{j \in S} \frac{p - z_j}{1-p} \Big) \right] \\
    \intertext{Now applying Lemma \ref{lem:help_beta2} on E1 we get} 
    &= \frac{1}{n} \sum_i \sum_{S' \subset \N_i} c_{i,S'} \prod_{j\in S'}\E[A_{ij}(1)]\left[ 1 \right]  = \tau(\vec{1},\vec{0})
\end{align*}
\end{proof}

\paragraph{Proof of Proposition 5.2}
Unbiasedness of $\hat{\tau}_{OIV}$ follows directly from Theorem A.1 by noting that a) the $\mathcal{M}_i$ in the statement of Proposition 5.2 satisifies the superset criteria in A.1 and b) when $\beta=1$, $\hat{\tau}^\beta = \hat{\tau}_{OIV}$.




\begin{lemma} \label{lem:help_beta3}
Consider the linear outcome model $Y_i(\bz) = b_i + c_{ii}Z_i + \sum c_{ij} \mathbb{I}[A_{ij}=1] Z_j$.  
For any sets $S$, consider $Q = Y_i \sum_{j \in S} \bigl[ \left(\frac{z_j}{p} - \frac{1-z_j}{1-p}\right) \bigr]$, we have
$$\E[Q] = \sum_{j \in S} c_{ij} \E[A_{ij}|Z_j=1] + \mathbb{I}[i \in S] c_{ii}$$
\end{lemma}
\begin{proof}
    \begin{align*}
    \E[Q] &= \E \bigl[ Y_i \sum_{j \in S} \bigl[ \left(\frac{z_j}{p} - \frac{1-z_j}{1-p}\right) \bigr]\bigr] \\
    &= \E \bigl[\left(b_i + c_{ii}Z_i + \sum c_{ij} \mathbb{I}[A_{ij}=1] Z_j\right) \sum_{j \in S} \bigl[ \left(\frac{z_j}{p} - \frac{1-z_j}{1-p}\right) \bigr]\bigr]\\
    &= \E \bigl[b_i\sum_{j \in S} \bigl[ \left(\frac{z_j}{p} - \frac{1-z_j}{1-p}\right) \bigr]\bigr] + \E \bigl[c_{ii}Z_i\sum_{j \in S} \bigl[ \left(\frac{z_j}{p} - \frac{1-z_j}{1-p}\right) \bigr]\bigr] \\
    &\;\;\;\;\;\;\;\;+ \E \bigl[ \sum c_{ij} \mathbb{I}[A_{ij}=1] Z_j \sum_{j \in S} \bigl[ \left(\frac{z_j}{p} - \frac{1-z_j}{1-p}\right) \bigr]\bigr] \\
        \intertext{Now applying Lemma \ref{lem:help_beta2}  we get} 
    &= 0 + c_{ii}\mathbb{I}[ {i} \subseteq S]
     + \sum_j c_{ij}\E[A_{ij}(1)]\mathbb{I}[ {j} \subseteq S]
     = \sum_{j \in S} c_{ij} \E[A_{ij}|Z_j=1] + \mathbb{I}[i \in S] c_{ii}
    \end{align*}
\end{proof}

\paragraph{Proof of Proposition 5.6}
Applying Lemma A.5, we get that $ Y_i\sum_{j \in \mathcal{M}^c_i} \bigl[ \left(\frac{z_j}{p} - \frac{1-z_j}{1-p}\right) \bigr] = \sum_{j \in \mathcal{M}^c_i} c_{ij} \E[A_{ij}|Z_j=1]$
By the homogeneity assumption, we know that $c_{ij}$ are same. Let it be denoted by $k_i$. Furthermore by conservation of $\mathcal{M}^c_i$, the edges are always present
Thus $ Y_i\sum_{j \in \mathcal{M}^c_i} \bigl[ \left(\frac{z_j}{p} - \frac{1-z_j}{1-p}\right) \bigr] = k_i |\mathcal{M}^c_i|$.
Next as argued in Section 5.3, to get the treatment effect we can rescale this quantity by $C_i = \frac{1}{p}\sum_j   Z_j  A_{ij}$ to get unbiased $\hat{\tau}$

%\subsection{Variance Bounds}
\subsection{Statistical Inference}
The results till now were focused with providing point-estimates of the treatment effect. However, in practice, one needs reasonable confidence intervals around these estimates, to handle statistical uncertainty and perform hypothesis tests to verify assumptions. For this purpose, we first argue that these estimators are asymptotically normal.

The generalized central limit theorems \citep{ross2011fundamentals} assures us that the sum of $n$ bounded random variables $R_i$,  asymptotically behaves like a gaussian distribution if they are mostly independent ; specifically if we construct the dependency graph, then it is not too dense \footnote{For the exact statement we refer the readers to Theorem 3.6 from ~\citet{ross2011fundamentals}}. The dependency graph in out case is provided by the network itself. Hence as long as the underlying interference network is sparse, these estimators are asymptotically normal. 
The normality of these estimator results suggests a way to do statistical inference. If we can get an upper bound for the variance then we can construct conservative Wald-type intervals \citep{wasserman2006all}. We should note however, that since the convergence is asymptotic, the use of the aforementioned variance for confidence intervals is only approximately valid. 


Next we provide such conservative bounds for variance of these estimators.



%Next, we argue that this estimator is asymptotically normal. For this we rely on a classic result in generalized central limit theorems \citep{ross2011fundamentals}. Informally, for a set of $n$ bounded random variables $R_i$, if the dependency graph is not too dense, then the variance normalized sum, $\sum R_i/\sqrt(\Var(\sum R_i))$ approach a gaussian distribution. We can clearly see that the estimator $\hat{\tau}^\beta$ (as well as other variants) can be written as a sum of such random variables. The dependence between the variables is represented by the neighbourhoods in  $\M$. As such if $\M$ is not too dense, $\hat{\tau}^\beta$ is asymptotically normal. Further note that while we assume that the max-degree of $\M$ i.e $d_\M$ is constant, the exact statement of the theorem allows sub-polynomial growth in the degree \footnote{For the exact statement we refer the readers to Theorem 3.6 from ~\citet{ross2011fundamentals}}.


Let the matrix \( A \in \{0,1\}^{ n \times n}\) denote the dependency graph. We are considering the linear additive model (\textbf{A4}). 
Since we $A_{ij}$ is dependent on $Z_j$, we can formulate them as \( A_{ij} \sim \text{Bernoulli}(q_1) \) if \( Z_j = 1 \), and \( A_{ij} \sim \text{Bernoulli}(q_0) \) if \( Z_j = 0 \). 
We assume that the max degree of any node is $\Delta$, . Thus for each node the aforementioned Bernoulli model only applies to $\Delta$ nodes.
Furthermore we also assume that $|\mathcal{M}^c_i|$ is bounded by $\Delta_{\M^c}$. 
Finally we have \( Z_j \sim \text{Bernoulli}(p) \).

Outcomes  \( Y_i \) are given by \( Y_i = c_i^\top A Z \), where \( c_i \) is \( \Delta \)-sparse (only \( \Delta \) non-zero entries). We assume that we know an upperbound $C$ for  $|c_{ij}|$. 
We focus on the \textbf{UIV Case} as it is more complex and the bound for OIV case can be derived from the bounds in this Section.

The estimator $\hat{\tau}_{UIV}$ is given by:
  \[
  \hat{\tau}_{UIV} = \frac{1}{n} \sum_{i=1}^n Y_i \left( \sum_{j \in \M^c_i} \left( \frac{Z_j}{p} - \frac{1-Z_j}{1-p} \right) \sum_{r=1}^n Z_r A_{ir} \right).
  \]

%\subsection*{Step 1: Decompose \( T \)}
Let \( S_i = \sum \left( \frac{Z_j}{p} - \frac{1-Z_j}{1-p} \right) \) and \( R = \sum_{r=1}^n A_{ir} \). Then:
\[
\hat{\tau}_{UIV}= \frac{1}{n} \sum_{i=1}^n Y_i S_i R_i.
\]

%\subsection*{Step 2: Bounding the Variance}
Now
\begin{align}
\text{Var}(\hat{\tau}_{UIV}) = \frac{1}{n^2} \left[ \sum_{i=1}^n \text{Var}(Y_i S_i R_i) + 2 \sum_{i < j} \text{Cov}(Y_i S_i R_i, Y_j S_j R_j) \right]. \label{app:eq:cov}
\end{align}

First we go about bounding \( \text{Var}(Y_i S_i R_i) \).
Since \( Y_i = \sum_{m \in \N_i} c_{im} \sum_{l=1}^n A_{ml} Z_l \) (with \( |\N_i| \leq \Delta \)):
  \[
  |Y_i| \leq C \sum_{m \in J_i} \sum_{l=1}^n A_{ml} Z_l \leq C \Delta.
  \]
  The second moment satisfies:
  \[
  \mathbb{E}[Y_i^2] \leq C^2 \mathbb{E}\left[\left(\sum_{l=1}^n A_{ml} Z_l\right)^2\right] \leq C^2 \Delta^2 p^2 q_1^2.
  \]

Next we consider bounding \( S_i \): Each term in \( S_i \) is mean 0 and has variance bounded by
  \[
  \text{Var}(S_i) = \sum_i \text{Var}\left( \frac{Z_j}{p} - \frac{1-Z_j}{1-p} \right)  \leq \Delta_{\M^c} max(\frac{1}{p},\frac{1}{1-p})
  \]

The sum \( R_i = \sum_{r=1}^k Z_r A_{ir} \) involves \( \Delta \) terms instead of \( n \). gives:
    \[
    \mathbb{E}[R_i] = \Delta p q_1,
    \]
    \[
    \text{Var}(R_i) \leq \Delta pq_1(1-pq_1).
    \]

Using Cauchy-Schwarz:
\[
\text{Var}(Y_i S_i R_i) \leq \mathbb{E}[(Y_i S_i R_i)^2] \leq C^2 \Delta^3 \Delta_{\M^c} p^5 q_1^4  (1-pq_1) \frac{1}{\min(p,1-p)}
\]

Next we try bounding Covariance Terms in \cref{app:eq:cov}. For \( i \neq j \), the covariance \( \text{Cov}(Y_i S_i R_i, Y_j S_j R_j) \) is non-zero only if \( Y_i \) and \( Y_j \) share dependencies. That happens only if there is overlap in \( \N_i \) and \( \N_j \)). Given \( \Delta \)-sparsity, each \( Y_i \) interacts with at most \( \Delta \) other terms. Thus:
\[
\sum_{i < j} \text{Cov}(Y_i S_i R_i, Y_j S_j R_j) \leq  n \Delta \cdot \text{Var}(Y_i S_i R_i).
\]


Combining terms we get:
\[
\text{Var}(\hat{\tau}_{UIV}) \leq \frac{1}{n^2} \left[ n \cdot \text{Var}(Y_i S_i R_i) + 2n \Delta\cdot \text{Var}(Y_i S_i R_i) \right].
\]
Substituting the bound we get:
\[
\text{Var}(\hat{\tau}_{UIV}) \leq  \frac{1}{n}(2\Delta +1)C^2 \Delta^3 \Delta_{\M^c} p^5 q_1^4  (1-pq_1) \frac{1}{\min(p,1-p)}
\]

We can follow a similar argument for \textbf{OIV} case, except in that case $R_i=1$. Following the same math as before we get the following result

\[
\text{Var}(\hat{\tau}_{UIV}) \leq  \frac{1}{n}(2\Delta +1)C^2 \Delta^2 \Delta_{\M}^2 p^4 q_1^3 \frac{1}{\min(p,1-p)}
\]


\subsection{Multi-Trial Estimation}

By a similar argument as in Section 5.4 (\cref{eqn:insight}) we can see that under linear additive interference:
$$
\E[Y_i] = b_i + c_{ii} p + \sum_{j}  c_{ij} \E[A_{ij}(1)] p
$$
which is very similar to the treatment effect, except for the additional term $b_{i}$ and the scaling by factor of $p$. 
We further note that while individual $Y_i$ might be very stochastic and far from their expected value, we can still obtain a good estimate of $E[\sum Y_i]$.

For this we rely on a classic result in generalized central limit theorems \citep{ross2011fundamentals}. Informally, for a set of $n$ bounded random variables $R_i$, if their dependency graph is not too dense, then the variance normalized sum approaches a normal distribution. If we consider $Y_i$ to be these random variables, their dependency graph is represented by the matrix $\bA$. If $\bA$ is not too dense under any counterfactual, then $\frac{1}{n} \sum_i^n Y_i$ is asymptotically normal with mean $\frac{1}{n} \sum_i^n \E[Y_i]$.


On the other hand we know that $\E[Y_i]$ is linear in $p$. Let $F(Y) =  \frac{1}{n}\sum_i Y$, then $\E[F(Y)]  = \frac{1}{n} \sum_i [ c_{ii} + \sum_{j}  c_{ij} \E[A_{ij}(1)] p] = \frac{1}{n} (\sum c_{ii}) + \tau p$

\begin{remark}
This holds true for more complex interaction models. More specifically if the set of all possible neighbours under all possible interactions (i.e. $\mathcal{M}^c_i)$ )is bounded by a  number $\beta$, then
$\frac{1}{n} \sum_i^n Y_i$ asymptotically converges to a polynomial of order $\beta$ in $p$.    
\end{remark}
    

Under the linear interference model, we can conduct two experiments at two different randomization probabilities $p_1$ and $p_2$, and fit a linear function in $p$. Let that function be $\hat{F}$. By the earlier argument $\hat{F}(p)$ is unbiased and consistent estimate of $F(p)$. By definition, global treatment effect $\tau$ is given by $(F(1) - F(0))$. 
Thus we have the following estimator
$$ \hat{\tau}_{MULTI} = \hat{F}(1) - \hat{F}(0)$$


\begin{prop}
Under assumptions \textbf{A1-4}, and assuming multiple independent trials,  $\hat{\tau}_{MULTI}$ is an unbiased estimate of the treatment effect $\tau$
\end{prop}

\section{Treatment Dependent Networks}
\subsection{Failure of HT Estimation}

   \begin{figure}[htp]
    \centering
    \begin{tikzpicture}[
      scale=0.1,
      node distance=1cm and 0cm,
      observed_node/.style={minimum size=1cm,fill=lightgray,text=black,draw=black,circle,text width=0.5cm,align=center},
      deterministic_observed_node/.style={minimum size=1cm,fill=lightgray,text=black,draw=black,circle,text width=0.5cm,align=center, double=none, double distance=1pt, even odd rule},
      hidden_node/.style={minimum size=1cm,fill=white,text=black,draw=black,circle,text width=0.5cm,align=center},
      deterministic_hidden_node/.style={minimum size=1cm,fill=white,text=black,draw=black,circle,text width=0.5cm,align=center, double, double distance=1pt},
      text_only_node/.style={minimum size=0.001cm,fill=white,text=black,draw=white,circle,text width=0.05cm,align=center},
    ]
   %% \node[hidden_node] at (30,-10*0.866) (E) {$E$};
   %% \node[observed_node] at (10,-20*0.866)  (E_star) {$\tilde{E}$};
   %% \node[observed_node] at (15,0)  (Y) {$Y$};
   %% \node[observed_node] at (45,0) (X) {$X$};
   %% \node[observed_node] at (50,-20*0.866) (Z) {$Z$};
   
    \node[hidden_node] at (30,0.866) (U) {$U$};
    \node[hidden_node] at (15,-30*0.866)  (L) {$L$};
    \node[hidden_node] at (45,-30*0.866) (R) {$R$};
    \path %(Z) edge[-latex] (X)
    %(X) edge[-latex] (E)
    (U) edge[-latex] (L)
    (U) edge[-latex] (R);
    \end{tikzpicture}
    \captionof{figure}{}
    \label{fig:counter1}
\end{figure}

edge UL exists if  and only if $Z_U=1$ otherwise the edge UR will exist. However outcomes at L and R, i.e. $(Y_L,Y_R)$ respectively are independent of treatment at $U$ and only depend on treatment at self with the effect being constant $\alpha$ i.e. the outcomes are $Y_{L/R}(1) = Y_{L/R}(0) + \alpha_{L/R}$. All treatments are perfectly randomized with probability $q=0.5$.


We consider the TTE (GATE) between $Z=\vec{0}$ and $Z=\vec{1}$ with the HT estimate here. By symmetry, we can consider only $U,L$ with the $U,R$ case analogous.
We have 4 potential treatments with corresponding values for the HT estimator being
$$\text{For }Z_U=1, Z_L=1\text{  we have} \qquad Y_L(1) ( \frac{1}{0.25})$$
$$\text{For }Z_U=1, Z_L=0\text{  we have} \quad Y_L(0) ( \frac{1*0}{0.25} - \frac{0*1}{0.25}) = 0$$
$$\text{For }Z_U=0, Z_L=1\text{  we have} \qquad Y_L(1) (\frac{1}{0.5})$$
$$\text{For }Z_U=0, Z_L=0\text{  we have} \qquad Y_L(0) ( -\frac{1}{0.5})$$

The expected value for the contribution of node $L$ in all possibilities is $( Y_L(1) + \frac{Y_L(1) - Y_L(0)}{2})$. 

\begin{comment}
$$\text{For }T_U=0, T_R=1\text{  we have} \qquad Y_R(1) ( \frac{1*0}{0.25} - \frac{0*1}{0.25}) = 0$$
$$\text{For }T_U=0, T_R=0\text{  we have} \qquad Y_R(0) ( -\frac{1}{0.25})$$
$$\text{For }T_U=1, T_R=1\text{  we have} \qquad Y_R(1) (\frac{1}{0.5})$$
$$\text{For }T_U=1, T_R=0\text{  we have} \qquad Y_R(0) ( -\frac{1}{0.5})$$
\end{comment}

Similarly, the contribution from node $R$ is $( -Y_R(0) + \frac{Y_R(1) - Y_R(0)}{2})$. 

Hence, the expected value of the estimator is given by $\frac{Y_L(1) - Y_R(0)}{2} + \frac{\alpha_L + \alpha_R}{4}$. 
%On the other hand, the true treatment effect is $(\alpha_L + \alpha_R)/2$. Thus we can see that the HT estimator is biased.


\subsection{}
We explain here a bit more formally the issue with HT estimation. For simplicity we will consider the linear interference model. First we would like to begin with the following result from \citet{sussman2017elements}.

Under assumption \textbf{A1-2,A4}, the expected value of the HT estimate and the HATE estimator $\hat{\tau}_{HATE} = \frac{1}{n} \sum_i Y_i  \sum_{j \in \mathcal{N}_i} \left(  \frac{z_j}{p} - \frac{(1-z_j)}{(1-p)}  \right)$ is the same.

We first have a look at how assumption \textbf{A4} affects $\tau_{\text{HT}}$. Substituting  \textbf{A4} in  \cref{eq:tau_ht}, $\tau_{\text{HT}}$ can be expressed as
\begin{small}
\begin{align*}
% \tau_{\text{HT}} =
\frac{1}{n} \sum_i \left [ c_{i} + \sum_{j \in \mathcal{N}_i} c_{i,j}z_j] \right] \left(  \prod_{k \in \mathcal{N}_i} \frac{z_k}{p} - \prod_{k \in \mathcal{N}_i} \frac{(1-z_k)}{(1-p)} \right).
\end{align*}
\end{small}
Now observe that as \textit{allocation} at each unit is independent, for any functions $g$ and $h$: $\E[h(z_i) g(z_j)] = \E[h(z_i)]\E[g(z_j)]$. Furthermore, as $\E[z_k/p] = \E[(1-z_k)/(1-p)] = 1$, we can ignore all the ratio terms for $k \neq j$ (see Lemma A.1).
% \begin{small}
% \begin{align*}
% % \displaystyle  
% \E[\tau_{HT}] = \frac{1}{n} \sum_i \E \left[ [ c_{i} + \sum_{j \in \mathcal{N}_i} c_{i,j}\mathbb{I}[{z_j=1}] ] \left(  \prod_{k \in \mathcal{N}_i} \frac{z_k}{p} - \prod_{k \in \mathcal{N}_i} \frac{(1-z_k)}{(1-p)} \right) \right]
% \end{align*}
% \end{small}
Therefore, $\tau_{\text{HT}}$  can be simplified as
{\small
\begin{align*}
\E[\tau_{\text{HT}}] = \frac{1}{n} \sum_i \E \left[ [ c_{i} + \sum_{j \in \mathcal{N}_i} c_{i,j}z_j ] \left(  \frac{z_j}{p} - \frac{(1-z_j)}{(1-p)} \right) \right],
\end{align*}
}%
which is a linear combination of in the terms $z_j/p$ and $(1-z)/(1-p)$; however this expression cannot be computed from only the graph and observed outcomes $Y_i$. 

We will rewrite this expression in terms of $Y_i$.  Observe that since  $z_j \indep z_i \; \forall i\neq j$, we can add terms of the form $z_i \left(\frac{z_j}{p} - \frac{1-z_j}{1-p}\right) \text{ with  } i \neq j$ without changing the expected value. Adding in such terms to include every node in $\N_i$, we get {
\begin{align*}
\E[\tau_{\text{HT}}] = \frac{1}{n} \sum_i \E \left[ \left( c_{i} + \sum_{j \in \mathcal{N}_i} c_{i,j}z_j \right) \left( \sum_{ k \in \N_i} \frac{z_k}{p} - \frac{(1-z_k)}{(1-p)} \right) \right]  
\end{align*}
}%
%This also implied that $ \E \left[ h(z_i) \left(\frac{z_j}{p} - \frac{1-z_j}{1-p}\right) \right] = 0$. Hence if $Y_i$ is multiplied by $\left(\frac{z_k}{p} - \frac{1-z_k}{1-p}\right)$ where $k$ is not a node which influences outcome at node $i$, then the contribution of such a term is 0 in expectation. 

%\yc{Optional: I like the previous subsection. This subsection still feels abrupt. Maybe consider starting the section with the intuition and then stating the estimator and the theorem.}
which motivates the following estimator:
\begin{align}
\label{eq:te_full_app}
\hat{\tau} = \frac{1}{n} \sum_i Y_i  \sum_{j \in \mathcal{M}_i} \left(  \frac{z_j}{p} - \frac{(1-z_j)}{(1-p)}  \right).
\end{align}

Now comes the crucial issue: the above derivation obscures the fact that $\N_i$ depended on $\bz$. Specifically a node $j \in \N_i$ if $A_{ij} = 1$. But since $A_{ij}$ are themselves potential outcome functions dependent on $\bz$, this neighbourhood is not static. One includes node $j$ in the above sum if
the edge was observed, and the probability of observing the edge is different for $z_j=0$ and $z_j=1$. Hence the more appropriate expression is

\begin{align}
\label{eq:te_full_app1}
\hat{\tau} = \frac{1}{n} \sum_i Y_i  \sum_{j} \left( \mathbb{I}[A_{ij}=1|z_j=1]\left(  \frac{z_j}{p} \right) - \mathbb{I}[A_{ij}=1|z_j=0]\left( \frac{(1-z_j)}{(1-p)}  \right) \right).
\end{align}

Unlike $\left(\frac{z_j}{p} - \frac{1-z}{1-p}\right)$ this term is not necessarily mean 0, but is instead $\E[A_{ij}|z_j=1] - \E[A_{ij}|z_j=0]$, and including this term in the regression ends up biasing the estimate.
This is why we needed either $\mathcal{M}_i$ (superset) or $\mathcal{M}^c_i$ (subset). In either case the neighbourhood over which the $\left(\frac{z_j}{p} - \frac{1-z}{1-p}\right) $ terms are added is not dependent on $\bz$.  $\hat{\tau}_{OIV}$ includes edges irrespective of their $A_{ij}$ value ( or more specifically skips terms only if $A_{ij}(z_j)=0$ always). On the other hand $\hat{\tau}_{UIV}$ only includes edges if $A_{ij}(z_j)=1$ always (and hence independent of $\bz$).

\begin{comment}
\subsection{}
Now we switch back to the regression perspective that was presented in the paper. We wrote the following regression model to relate the outcomes \(\mathbf{Y}_i\) and the independent variables \(\mathbf{Z}_i\):

\[
\underbrace{\begin{bmatrix}
    Y_i^1 \\
    Y_i^2 \\
    \vdots \\
    Y_i^r 
\end{bmatrix}}_{r\times 1}
\;=\;
\underbrace{\begin{bmatrix}
    1 &  \bigl(Z_{N(i)}^1\bigr)^\top \\
    1 &  \bigl(Z_{N(i)}^2\bigr)^\top \\
    \vdots & \vdots \\
    1 &  \bigl(Z_{N(i)}^r\bigr)^\top
\end{bmatrix}}_{r\times d}
\underbrace{\begin{bmatrix}
b_i \\
    c_{ii} \\
    \vec{c}_i
\end{bmatrix}}_{d\times 1}
\quad \Rightarrow \quad
\bY_i \;=\; \bZ_i \,\bc_i.
\]


The network structure may change from trial to trial as each trial has a different $\bz$. Consequently, for each experiment \(r\), the set of neighbors \(N(i)\) can vary, leading to different observed components in \(Z_{N(i)}^r\).
From the earlier discussion the variables observed are not $z_j$ but $\phi[A_{ij}=1|z_j] z_j$, where $\phi$ is similar to the indicator function $\mathbb{I}$, except it outputs $NaN$ or missing instead of 0, when the condition is false. Thus in the regression perspective we end up having to estimate with missing data.
\end{comment}
\section{Experimental Details}

\subsection{Synthetic Graphs}
\label{apx:synth}
%The Erdos-Renyi (ER) model is commonly used for analyzing interaction networks in  various experimental settings, particularly in the realm of social media \citep{seshadhri2012community} and epidemic control \citep{kephart1992directed,wang2003epidemic}.  In social media platforms, where connections form organically, ER graphs provide a reasonable simulation of how friendships, followerships, or interactions might evolve in an online community \citep{erdos1960evolution}. Additionally, in the context of epidemic control, ER graphs are valuable for studying disease spread \citep{wang2003epidemic}. 

We sample different random Graphs and run repeated experiments on these graphs with randomized bernoulli treatment assignment. The baselines include the POLY(Prop/Num) estimator is a polynomial regression on the exposure as computed by the fraction/number of treated nodes in the neighbourhood. The DM estimator signifies the classic difference in mean/ SUTVA estimator which is is simply the average outcomes on treated vs un-treated units. 
The ER graphs are made with an expected neighbourhood of size 20. The outcome model is similar to the potential outcomes model as in \cite{YuCortezEichhorn22}:
\begin{equation}
    Y_i(\bz) = c_{i,\emptyset} + \sum_{j\in \N_i} \tilde{c}_{i,1}z_j + \sum_{\ell = 2}^{\beta} \left( \frac{\sum_{j \in \N_i}\tilde{c}_{ij,2} a_{ij} z_j}{\sum_{j \in \N_i}\tilde{c}_{ij,2}} \right)^{\ell},
\end{equation}
where  $i \neq j$, $\tilde{c}_{ij,2} = v_{i,2} |\N_i|/\sum_{k: (k,j) \in E} |\N_k|$. The coefficient $c_{i,\emptyset}, \tilde{c}_{i,1}, v_{i,2}$ are obtained as random variables.
%a linear function of the covariates $X_i$.

%We also provide the signed relative bias and RMSE plots from these experiments in Figure \ref{fig:apx:ablation_er}


%Neighbourhoods are determined by a the Poly and EXP require oracle neighbourhoods which can be obtained using latitude and longitude information. However, for our method we will not leverage such information for identification of the interfering neighbourhood sites, and instead use clusters based on the 9 census divisions as specified by the US Census Bureau.




