%\documentclass{uai2025} % for initial submission
\documentclass[accepted]{uai2025} % after acceptance, for a revised version; 
% also before submission to see how the non-anonymous paper would look like 
                        
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2025} % ptmx math instead of Computer
                                         % Modern (has noticeable issues)
% \documentclass[mathfont=newtx]{uai2025} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

% Recommended, but optional, packages for figures and better typesetting:
\usepackage{microtype}
\usepackage{graphicx}
%\usepackage{subfigure}
\usepackage{booktabs} % for professional tables

\usepackage{xcolor}

% hyperref makes hyperlinks in the resulting PDF.
% If your build breaks (sometimes temporarily if a hyperlink spans a page)
% please comment out the following usepackage line and replace
% \usepackage{icml2023} with \usepackage[nohyperref]{icml2023} above.
\usepackage{hyperref}

%\def\lecturemark{}
%\fancyhf{}
%\fancyhead[L]{\lecturemark}
%\fancyfoot[C]{\thepage}

%\newcommand{\lecture}[1]%{\part{#1}\def\lecturemark{\partname\ \thepart: #1}}
%\renewcommand{\partname}{Problem}

% Attempt to make hyperref and algorithmic work together better:
\newcommand{\theHalgorithm}{\arabic{algorithm}}
%\usepackage{algorithm}
%\usepackage{algpseudocode}

\usepackage{multirow}
% If accepted, instead use the following line for the camera-ready submission:
% \usepackage[accepted it]{icml2023}

% For theorems and such
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{mathtools}
\usepackage{amsthm}
\usepackage{mathrsfs}

% if you use cleveref..
\usepackage[capitalize,noabbrev]{cleveref}


\usepackage[textsize=tiny]{todonotes}


% The \icmltitle you define below is probably too long as a header.
% Therefore, a short form for the running title is supplied here:
%\icmltitlerunning{Submission and Formatting Instructions for ICML 2023}


%\input{math_commands.tex}

\usepackage{hyperref}
\usepackage{url}

\usepackage{amsmath,amsthm,amssymb}
\usepackage{dsfont}
\usepackage{wrapfig}
\usepackage{etoolbox}
%\usepackage{subfigure}

%
%
\usepackage[noabbrev,capitalize]{cleveref}

%
\usepackage{bbm}


%
\usepackage{booktabs}

%
\usepackage{bm}
%\usepackage{paralist}
%
\usepackage[inline]{enumitem}
\usepackage{mathtools}
\usepackage{listings}
\usepackage{xcolor}
\definecolor{codegreen}{rgb}{0,0.6,0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codepurple}{rgb}{0.58,0,0.82}
\definecolor{backcolour}{rgb}{0.95,0.95,0.92}
\lstdefinestyle{mystyle}{
    backgroundcolor=\color{backcolour},   
    commentstyle=\color{codegreen},
    keywordstyle=\color{magenta},
    numberstyle=\tiny\color{codegray},
    stringstyle=\color{codepurple},
    %basicstyle=\ttfamily\footnotesize,
    %basicstyle=\medium,
    breakatwhitespace=false,         
    breaklines=true,                 
    captionpos=b,                    
    keepspaces=true,                 
    numbers=left,                    
    numbersep=5pt,                  
    showspaces=false,                
    showstringspaces=false,
    showtabs=false,                  
    tabsize=2
}
\lstset{style=mystyle}


%
\usepackage{graphicx}

\usepackage{ifthen} %
\usepackage{soul}
\usepackage{subcaption} % multiple panels per figure
\usepackage{booktabs}
\usepackage{adjustbox} % for adding tables
\usepackage{etoolbox}  % for ifiselse in fig and tab commands
\usepackage{accents}
\usepackage{apptools}
\usepackage{float}
\usepackage{scalerel,stackengine}
\usepackage{mathtools}
\usepackage[most]{tcolorbox}
\usepackage{cleveref}
\usepackage{caption}
\usepackage{comment}
\usepackage{tikz}
\usetikzlibrary{intersections,arrows, arrows.meta, decorations.markings, matrix, calc, bayesnet, automata, chains, backgrounds, positioning, fit, shapes}
\usepackage{soul}

%
%
%\usepackage{subfigure}
%
\usepackage{setspace}
%\let\Algorithm\algorithm
%\renewcommand\algorithm[1][]{\Algorithm[#1]\setstretch{1.0}}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% THEOREMS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\theoremstyle{plain}
\newtheorem{theorem}{Theorem}[section]
\newtheorem{thm}{Theorem}[section]
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{prop}[theorem]{Proposition}

\newtheorem{corollary}[theorem]{Corollary}
\theoremstyle{definition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{assumption}[theorem]{Assumption}
\theoremstyle{remark}
\newtheorem{remark}[theorem]{Remark}
\newtheorem{dassumption}{Diet Assumption}
\newtheorem{apxproposition}[theorem]{Proposition}
\theoremstyle{lemma}
\newtheorem{lemma}[theorem]{Lemma}

\newcommand{\experimentalRow}[2]{#1 & #2 \\ \hline}


\newtheorem{rem}{Remark}[]
\patchcmd{\endrem}{\@endpefalse}{}{}
%\AfterEndEnvironment{rem{\noindent\ignorespaces}

% Todonotes is useful during development; simply uncomment the next line
%    and comment out the line below the next line to turn off comments
%\usepackage[disable,textsize=tiny]{todonotes}
%
\newcommand{\convD}{\stackrel{\mathcal{D}}{\longrightarrow}}
\newcommand{\convP}{\stackrel{\mathcal{P}}{\longrightarrow}}

\newcommand{\colorcheckmark}{{\color{green} \mathbf{\checkmark}}}
\newcommand{\colortimes}{{\color{red} \mathbf{\times}}}


%
\newcommand{\PA}{\operatorname{PA}}
\newcommand{\scm}{\mathcal{S}}

%
\newcommand{\VAR}{\mathbb{V}}
\renewcommand{\P}{\mathbb{P}}
\newcommand{\DO}{\operatorname{do}}

%\newcommand{\tens}[1]{%
%  \mathbin{\mathop{\otimes}\displaylimits_{#1}}%
%}

\newcommand{\vx}{\mathbf{x}}
\newcommand{\vm}{\mathbf{m}}
\newcommand{\E}{\mathbb{E}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\N}{\mathcal{N}}
\newcommand{\A}{\mathcal{A}}

\def \bc {\boldsymbol{c}}
\def \bz {\boldsymbol{z}}
\def \bZ {\boldsymbol{Z}}
\def \bX {\boldsymbol{X}}
\def \bA {\boldsymbol{A}}
\def \bY {\boldsymbol{Y}}
\def \bc {\boldsymbol{c}}
\newcommand{\indep}{\perp \!\!\! \perp}

\newcommand{\mex}{\text{MEX}}


%\newcommand{\I}{\mathcal{I}} % influence function
\newcommand{\I}{\mathbb{I}} % indicator function
\newcommand\numberthis{\addtocounter{equation}{1}\tag{\theequation}}
%

%

\newcommand{\ssh}[1]{\textcolor{blue}{[#1--- ss]}}
\newcommand{\ina}[1]{\textcolor{magenta}{[#1--- ina]}}
\newcommand{\imp}[1]{\textcolor{red}{[IMP: #1]}}
\newcommand{\red}[1]{\textcolor{purple}{[#1]}}

\newtoggle{iclr}
\togglefalse{iclr}

%\crefrangelabelformat{equation}%
%{(#3#1#4--#5\crefstripprefix{#1}{#2}#6)}
%\crefrangelabelformat{subequation}%
%{(#3#1#4--#5\crefstripprefix{#1}{#2}#6)}

% If you use natbib package, activate the following three lines:
%\usepackage[authoryear]{natbib}
%\renewcommand{\bibname}{References}
%\renewcommand{\bibsection}{\subsubsection*{\bibname}}

% If you use BibTeX in apalike style, activate the following line:
%\bibliographystyle{apalike}
%\bibliographystyle{ACM-Reference-Format}


% If your paper is accepted and the title of your paper is very long,
% the style will print as headings an error message. Use the following
% command to supply a shorter title of your paper so that it can be
% used as headings.
%
%\runningtitle{I use this title instead because the last one was very long}

% If your paper is accepted and the number of authors is large, the
% style will print as headings an error message. Use the following
% command to supply a shorter version of the authors names so that
% they can be used as headings (for example, use only the surnames)
%
%\runningauthor{Surname 1, Surname 2, Surname 3, ...., Surname n}

\title{Experimentation under Treatment Dependent Network Interference}

% The standard author block has changed for UAI 2025 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1]{\href{mailto:<sshankar@cs.umass.edu>?Subject=Your UAI 2025 paper}{Shiv Shankar}{}}
\author[2]{Ritwik Sinha}
\author[1]{Madalina Fiterau}
% Add affiliations after the authors
\affil[1]{%
    %College of Information and Computer Science\\
    University of Massachusetts\\
    Massachusetts, USA
}
\affil[2]{%
    Adobe Research\\
    California, USA\\
}

  
  \begin{document}
\maketitle

\begin{abstract}
Randomized Controlled Trials (RCTs) are a fundamental aspect of data-driven decision-making. RCTs often assume that the units are not influenced by each other. 
Traditional approaches addressing such effects assume a fixed network structure between the interfering units. However, real-world networks are rarely static, and treatment assignments can actively reshape the interference structure itself, as seen in financial access interventions that alter informal lending networks or healthcare programs that modify peer influence dynamics. This creates a novel and unexplored problem: estimating treatment effects when the interference network is determined by treatment allocation.
In this work, we address this gap by proposing two single-experiment estimators for scenarios where network edges depend on nodal treatments constructed from instrumental variables derived from neighbourhood treatments. We prove their unbiasedness and experimentally validate the proposed estimators both on synthetic and real data. 
%The key contribution is a framework that allows for the estimation of heterogeneous interference effects even when the network is treatment-dependent.

%    \ina{We say that "our experiments validate our theoretical analysis, but we don't say what the analysis is. I think we should start by stating the properties of our estimators (lower bias and increased robustness), and the state, briefly, a compelling experimental result that supports them.}
\end{abstract}



\section{Introduction}

Randomized controlled trials (RCTs), or A/B testing, is a fundamental tool for assessing the effectiveness of interventions across multiple disciplines, including healthcare\citep{antman1992comparison}, and digital platforms\citep{siroker2015b}. In such a test, treatment (group A) and control (group B) assignments are made independently of other variables, including potentially unknown ones. The outcomes from the two groups can be compared to estimate the desired causal effects. Such an experimentation-based approach empowers data-driven decision-making about the most effective treatments \citep{aral2011creating}.

Despite its basic soundness, A/B testing is not without challenges, particularly in large-scale experiments where key assumptions may not hold \citep{pouget2018dealing,shankar25online}. One major issue is interference between subjects, where individuals in the control group are indirectly affected by the treatment assigned to others. This spillover can distort the estimated treatment effect and lead to biased conclusions. For instance, in social networks, recommendations made to users in the treatment group may be shared with those in the control group, reducing the observed difference between the two groups \citep{brennan2022cluster,PougetAbadieSaveskiSaintJacquesDuanXuGhoshAiroldi17}. Similarly, in public health studies, herd immunity can lead to a spillover effect, making it challenging to isolate the direct impact of a vaccination program \citep{randolph2020herd,fine1993herd}.

This phenomenon, where treatment of a unit affects outcomes for other units, has been studied in the causal literature \citep{hudgens_halloran08,lesage2009introduction} under the name of interference. 
A common assumption in such studies is that the structure of interference is encoded by an apriori known network \citep{ogburn2017causal,leung2020treatment}. This is the \textit{neighbourhood interference} assumption, where interference is confined within neighbours in a graph. This dependence graph is typically inferred using observable data like social connections \citep{AronowSamii17}, historical user interactions \citep{bakshy2012social,karrer2021network}) or from a user linking model~\citep{sinha2014estimating,saha2015probabilistic}.

However, in practice, the network structure obtained for post-experimentation analysis is rarely static \citep{heckman2015econometric,savje2024causal,sweet2020latent}. Furthermore, the interference graph itself may be affected by the treatment \citep{gao2024endogenous,rogowski2012estimating}. A classic example comes from a case study of the introduction of financial and banking access to households in an underdeveloped village \citep{prina2015banking}. In such communities, families and friends often serve as informal lenders when facing financial hardships. However, the introduction of banking access can lead to changes in these informal lending connections. For example, those units with access to banks may not borrow from each other as previously. On the other hand, there may be increased lending between peers among whom only one has bank access. Similarly, in healthcare interventions, individuals encouraged to join peer support groups will experience different levels of social influence than those unaware of such networks, making the interference structure dynamic rather than fixed \citep{arminen1998therapeutic}.
These cases \emph{introduce a new scenario, requiring the estimation of treatment effects when the network structure of interference depends on the treatment allocation}.

%which offered access to formal savings accounts to a random sample of female household heads in 19 villages in Nepal. My results demonstrate that IV estimation effectively addresses the endogeneity issue arising from un- observed confounders. Results from IV estimation and the OLS estimates in Prina (2015) can differ in both sign and significance when the indirect effect is significant. Additionally, I find that the intervention influences various outcomes through multiple channels: some are directly impacted by the treatment, while others operate through changes in the network mediator, which captures patterns of interference.

\paragraph{Contribution} 
. 
In this work, we consider a network interference scenario in which the existence of edges between nodes is determined by the treatments assigned to those nodes. We provide two different single-experiment estimators for this problem. We show them to be unbiased and experimentally validate their performance.



%In reality, the true interference structure is seldom known \citep{Egami2021, Savje2024}. Researchers frequently attempt to infer it from observable data, such as historical user interactions, and use this information to construct interference graphs (e.g., \cite{Aral2009, Bakshy2012, Bond2012, Coppock2016, Harshaw2021, Karrer2021}). However, these graphs are typically imperfect approximations of the actual interference structure.




%obtained, for instance, from devices with login information, from geolocation based on IP addresses 


%\ssh{Maybe create a contribution para}
%\ssh{improve the figure a bit}


\begin{figure*}
    \centering
    \includegraphics[width=0.9\textwidth,trim={0.3cm 0 0.3cm 0},clip]{diagram-20250209.pdf}
    \caption{ $Z=1$ denotes the units in the treatment group and $Z=0$ denotes units in the control group. \textbf{(a)} Standard A/B testing where there is no interaction between the treatment and the control units.\textbf{(b)} Network interference due to fixed (static) interaction between the units. \textbf{(c)} We have network interference however all the red edges are potential edges, and only occured due to specific treatment allocation. \label{fig:1} }
\end{figure*}


\section{Related Work}




%\ssh{shrink related work to add space for more expt details later}

\paragraph{Network Interference}
Network interference is a well-studied topic in causal inference literature \citep{BasseAiroldi15,cai2015social,chin2019regression,GuiXuBhasinHan15,ToulisKao13}. First, formally identified by \citet{cox1958planning}, interference relates to a violation of the Stable Unit Treatment Value Assumption (SUTVA) \citep{rubin1978bayesian}. Network interference \citep{hudgens_halloran08} relates to the idea that the effects on a unit can be encapsulated in a neighbourhood structure.  
Most approaches include assumptions about the interference neighbourhood \citep{bargagli2020heterogeneous,frank2020causal}. \citet{hudgens_halloran08} proposed a method based on clustered interference, which was later extended by \citet{zhang23a,ogburn2024causal,shankar2024unite} to allow more flexible network structures. Some other methods focus on using graphical causal models to directly adjust for interference \citep{ogburn2014causal,spohn2023graphical,shpitser2017modeling}.
\citet{shankar2024estimating} have extended the work on interference to other distributional quantities such as median and CVar.
Linear interference model \citep{sussman2017elements,jiang2023causal,pouget2018dealing} or exposure mappings \citep{AronowSamii17,savje2021average} are common assumptions for incorporating heterogeneity in interference.
 \citet{oriordan2025local} extend methods based on linearity assumptions to include semi-parametric models. We summarize some common approaches and how our method differs from them in \cref{tab:lit_summary}. A detailed discussion of these is in the Appendix. 
 %However broadly the biggest challenge with these methods is assumption about the exact networks, though some 

%This simplifies GATE estimation by implicitly providing access to both the factual and counterfactual outcome. However, such a model is unrealistic for our motivating use case of continuous optimization. Furthermore, in more general settings, conducting multiple trials can be difficult, if not impossible  \citep{shankar23diet}. Thus, we aim to develop a \emph{method which can work with only a single trial and/or observational data from an existing test}.






        



\paragraph{Misspecified and Uncertain Interference}
A major challenge in network interference analysis is dealing with noisy or misspecified networks \citep{carroll2006measurement,ogburn2013bias,lockwood2016matching}. 
Recently, some methods have been developed to handle the strong assumptions often made in network interference literature (e.g., \cite{leung2022causal,wang2020causal,savje2024causal,auerbach2024exposure,shankar22cookie}). In a related direction, research has also focused on settings where the underlying network structure is unknown or only partially known (e.g., \cite{chin2019regression, savje2021average,cortez2022exploiting,shankar25online}).
Most of these methods are based on multiple measurements~\citep{shankar22cookie,YuCortezEichhorn22,YuAiroldiBorgsChayes22}, though some other approaches exist based on outcome assumptions \citep{shankar2024online} and on uncertainty estimates for the network structure \citep{zhang23a}.
Other approaches include methods based on measurement error \citep{miao2018identifying,kuroki2014measurement} and confounding models \citep{shpitser2021proximal}. When networks are uncertain, methods for obtaining partial identification bounds for treatment effects have been proposed \citep{zhao2017sensitivity,yadlowsky2018bounds}.

However, these methods still assume a static network, i.e. a network which is fixed though perhaps unknown. Departing from prior work, our study analyzes the scenario where \emph{the observed edges which characterize the interference are themselves dependent on the treatment assignments}. This introduces a unique challenge, as the very structure of interference becomes treatment-dependent.


\textbf{PseudoInverse Estimators}
Network interference is also related to a problem in slate and combinatorial bandits \citep{jia2024multi,xu2024linear}. Several works have addressed this challenge by assuming specific parametric models, such as linear relationships, to link slate features to outcomes \citep{a2,a10,a29}. A valuable tool in these settings is the Pseudoinverse estimation \citet{a7,a31,a11}. Other studies adopt a similar assumption but operate under a semi-bandit feedback model \citep{a19,a22,a21}. Our solution inspires from these pseudoinverse estimators, but is solving a fundamentally different problem, as the problem of treatment dependent interference is not directly addressable by these methods.
%This additional feedback can improve estimation accuracy but may not always be feasible in practice, particularly in applications where only partial or 


\begin{table*}[ht]
%\small
\centering
\caption{Literature Summary. We list a few important works, a few desiderata and whether they are met $\colorcheckmark$ or not $\colortimes$. Our work focuses on the problem of treatment dependent interference which the other methods do not handle. 
\label{tab:lit_summary}}
\begin{tabular}{|l||c|c|c|c|}
\hline
%Example Works &  Uncertain Graph & Non-Linear Model & No Extra Info \\
&\multicolumn{1}{|p{1.2cm}|}{\centering General \\ Graph }
& \multicolumn{1}{|p{1.2cm}|}{\centering Uncertain \\ Edges }
%& \multicolumn{1}{|p{1.2cm}|}{\centering Non-Linear \\ Outcome}
& \multicolumn{1}{|p{1.2cm}|}{\centering Single \\ Trial}
& \multicolumn{1}{|p{1.7cm}|}{\centering Treatment Dependent Network \\ }\\
\hline
%& \multicolumn{1}{|p{2cm}|}{\centering BosTaurus \\ CC }
\citep{hudgens_halloran08,LiuHudgens14} & $\colortimes$ &   $\colorcheckmark$  & $\colorcheckmark$ & $\colortimes$  \\
\citep{yuan2022causal,YuAiroldiBorgsChayes22}  & $\colorcheckmark$    &     $\colortimes$                &    $\colorcheckmark$        & $\colortimes$          \\
\citep{YuCortezEichhorn22,shankar22cookie}    & $\colorcheckmark$  &   $\colorcheckmark$  &      $\colortimes$            & $\colortimes$       \\
\citep{AronowSamii17,SavjeAronowHudgens17,ToulisKao13}  & $\colorcheckmark$            &      $\colortimes$    &     $\colorcheckmark$        & $\colortimes$   \\   
%\citep{ToulisKao13, EcklesKarrerUgander17,sussman2017elements} &  $\colorcheckmark$               &     $\colorcheckmark$    & $\colortimes$    \\   
Ours  (\cref{sec:OIV})&  $\colorcheckmark$  &   $\colorcheckmark$ & $\colorcheckmark$      & $\colorcheckmark$           \\ 
Ours  (\cref{sec:UIV}) &  $\colorcheckmark$ &   $\colorcheckmark$ &  $\colorcheckmark$      & $\colorcheckmark$        \\
%Ours  (Section 5.4) &  $\colorcheckmark$  &   $\colorcheckmark$ &     $\colortimes$      & $\colorcheckmark$    & $\colorcheckmark$            \\ 
\hline
\end{tabular}
\end{table*}
%A common approach is the exposure mapping framework which allows defines a degree of "belonging" of a unit to either the treatment or control group \citep{AronowSamii17, auerbach2021local, li2021causal, viviano2020experimental}. 
%A common assumption is that the network effect is linear with respect the neighbour treatments. 
%A limitation of these approaches is that they require complete knowledge of the network structure. While our approach also relies on imposing an exposure-based structure to the form of interference, however \emph{we work with an incomplete knowledge of the network}.


%These models are also related to dose-response literature, as the primary role of the exposure function is to provide a summary statistic which encapsulates all necessary information of the effective treatment.


%Another thread of work focuses on causal inference under assumptions about the interference neighbourhood \citep{bargagli2020heterogeneous, pmlr-v115-bhattacharya20a, UganderKarrerBackstromKleinberg13}.

%Treatment effect estimation with unknown network interference has also been well studied, beginning with the seminal work of \citet{hudgens_halloran08}. The key insight behind these works is that if the network can be broken into clusters, then one can perform treatment effect estimation without the full knowledge of the interference structure withing the clusters. Other works such as \citet{auerbach2021local,pmlr-v115-bhattacharya20a,LiuHudgens14,TchetgenVanderWeele12,VanderweeleTchetgenHalloran14} have extended this idea further. Often the bias of these estimators depends on the the number of edges between the clusters, which has led to optimization-based methods for constructing clusters~\citep{EcklesKarrerUgander17, GuiXuBhasinHan15}. However, this still requires information about the clusters, and is not applicable if multiple clusters of the required type do not exist. On the other hand, \emph{our method can handle general unstructured graphs}. Finally, there are methods, which under restrictive assumptions, use SUTVA based estimates for one-sided hypothesis tests for treatment effect under interference \citep{choi2017estimation,athey2019estimating,lazzati2015treatment}.

%Constructing good clusters is also computationally intensive \citep{abadi2020}
%\paragraph{Estimation with Unknown Interference}: 

\section{Notation}

We are given a population of $n$ units. Let $\bZ$ be the treatment assignment vector of the entire population and let $\mathcal{Z}$ denote the treatments' space, e.g., for binary treatments $\mathcal{Z} = \{0,1\}^n$ (see Figure 1). We use the Neyman potential outcome framework \citep{Neyman1923,rubin1974estimating}, and denote by  $Y_i(\bz)$ the potential outcome for each $\bz \in \mathcal{Z}$. 
%In this framework,  $Y_i(\bz)$ can be considered as fixed functions. Hence, randomness arises solely from the assignment of $\bZ$. 
We make observations at unit\linebreak level and denote these observations as $Y_i$ for unit $i$.

We will consider randomized Bernoulli designs, i.e., each unit $i$ gets allotted the treatment $z_i=1$ independently with probability $p_i \in (0,1)$. This is natural and easy to implement and satisfies standard randomization and positivity assumptions in causal inference.

\begin{tcolorbox}[enhanced,colback=white,colframe=black, coltitle=white, center title,title=Standard Causal Assumptions]
\vspace{-3ex}
\begin{align}
%&\text{Network Ignorability:}\;\;  Y(\bz) \indep \bZ  \: \forall \bz  \label{ass:seq} \tag{\textbf{A3}} \\
&\text{Positivity:} \;\; P(\bz) > 0 \: \forall \bz \label{ass:pos} \tag{\textbf{A1}} \\
&\text{Consistency:} \;\;  Y_i = Y_i(\bz) \text{ if } \bZ = \bz  \label{ass:cons} \tag{\textbf{A2}}
%\end{align}
%\begin{align}
\end{align}
\end{tcolorbox}


%Additionally we may have access to covariates $X_i$ at the units.
%Note that the units might have a common user, as presented in Figure~\ref{fig:1}.
We assume that the unit outcome is not determined just by the treatment at the unit but potentially also by treatments allocated to other units.
%, that is, $Y_i(\bz) \neq Y_i(z_i)$ \citep{Cox1958}. 
This is a violation of the SUTVA assumption \citep{cox1958planning,hudgens_halloran08} and is commonly called interference.

This dependence can be represented as a graph (Figure \ref{fig:1}b), where each node represents a unit and the presence of an edge indicates a possible influence between each other. The underlying graph is given by its adjacency matrix $\bA \in \mathbb{R}^{n\times n}$, with $A_{ij} = 1$ only if an edge exists between from unit $j$ to unit $i$, and by convention $A_{ii} = 1$.
Let $\mathcal{N}_i=\{j: A_{ij}=1\}$ be the set of \textit{neighbours} of unit $i$ in the unit-unit graph. We assume that the outcomes depend only on the node's neighbours in the unit-unit graph. 
This is similar to the classic network neighbourhood interference assumption \citep{hudgens_halloran08,sussman2017elements}. 
However, the classic network interference is not a valid assumption in the scenario we are considering. 
%Our work focuses on a treatment dependent network. 

%\begin{small}
%$$\forall \bz, \bz'  
%\text{ s.t. } \; z_i=z'_i \;  \text{and} \;  z_j = z'_j \; %\forall j \in \mathcal{N}_i : \\ 
%         Y_i(\bz) = Y_i(\bz').$$
%\end{small}

\begin{comment}
\begin{tcolorbox}[enhanced,colback=white,colframe=black, coltitle=white, center title,title=Network Interference]
\vspace{-3ex}
\begin{align*}
\label{eq:ani}
\forall \bz, \bz'  
\text{ s.t. } \; z_i=z'_i \;  \text{and} \;  z_j = z'_j \; \forall j \in \mathcal{N}_i : \\ 
         Y_i(\bz) = Y_i(\bz').
\tag{\textbf{A0}}
 \end{align*}

\end{tcolorbox}
\end{comment}

%This network interference assumption works consider a fixed network\citep{sussman2017elements}.

Instead, we have a two-stage generative process. We first have a treatment-dependent network formation. Next, conditioned on the network thus formed, the standard network interference assumption is assumed to be valid. To model the network-dependent behaviour, we consider the variables $A_{ij}(\bz)$ as an additional set of potential outcome variables for each possible edge in the network. Corresponding to the potential network edges, we also have neighbourhoods $\mathcal{N}_i(\bz)$. The fundamental interference assumption in our case can be stated as:
\vspace{-0.5cm}
\begin{tcolorbox}[enhanced,colback=white,colframe=black, coltitle=white, center title,title=Treatment Dependent Network Interference]
\vspace{-3ex}
\begin{align*}
\label{eq:ani}
\forall \bz, \bz'  
&\text{ s.t. } \; z_i=z'_i \;  \text{and}  \;\mathcal{N}_i(\bz) = \mathcal{N}_i(\bz') \\
&\text{and} \;  z_j = z'_j \; \forall j \in \mathcal{N}_i(\bz) : \\ 
         &Y_i(\bz) = Y_i(\bz').
\tag{\textbf{A3}}
 \end{align*}

\end{tcolorbox}






 %$\bA$ satisfies positivity if    $p_i^{(\bA)}(c_\ell)>0$ for all units $i=1,...,n$ and exposure values $\ell=1,...,L$. 
Our primary focus is on estimating the Global Average Treatment Effect (GATE) under the previously outlined scenario, where the network structure itself may change based on the chosen treatments. The desired causal effect is the mean difference between the outcomes when $\bz=\vec{1}\, i.e., \, z_i=1 \, \forall i$ and when $\bz=\vec{0}\, i.e., \, z_i=0 \, \forall i$. Under the aforementioned notations, this causal effect is given by:
%To define causal effects under the above-described framework, we first define the mean potential outcomes $\mu(c_\ell) = \frac{1}{n}\sum_{i=1}^{n}Y_i(c_\ell),\; \ell = 1 ,\dots, L$. Causal effects are defined as the difference in the mean potential outcomes, 
\begin{equation}    \label{eq:definition_causal_effect}
    \tau(\vec{1}, \vec{0}) = \frac{1}{n}\sum_{i=1}^{n} \E[ Y_i(\vec{1}) -  Y_i(\vec{0})]
\end{equation}
where the expectation $\E$ marginalizes over the different networks.  Correspondingly we can also define the individual global treatment effect $\tau_i = \E[ Y_i(\vec{1}) -  Y_i(\vec{0})]$



\paragraph {SUTVA Estimate}
The SUTVA estimate (or the DM estimate) is given by
$$ \hat{\tau}_{\text{SUTVA}} = \bar{Y}^1 - \bar{Y}^{0} = \dfrac{\sum {Y}_i \mathbb{I}[Z_i = 1]}{ \sum \mathbb{I}[Z_i = 1]} - \dfrac{\sum {Y}_i \mathbb{I}[Z_i = 0]}{ \sum \mathbb{I}[Z_i = 0]}$$

where $\bar{Y}^{0/1}$ are the average of observed outcomes for units where $Z_i=0/1$ respectively. This estimator, while simple and practical, requires the SUTVA assumption, and hence can be misleading in our scenario.
%Since it is the difference in means of control and treatment groups, it is also called the difference in mean/ DM estimator. 
%Naturally, the strength/type of interference determines the estimate bias.
%


%\ssh{Perhaps shrink this a bit, or merge it with expt as these are baselines}

\section{Challenge and Formulation}
\paragraph {Inverse Propensity/Horvitz-Thompson Estimate}
A classic method to estimate treatment effects is the Horvitz Thompson estimator \citep{horvitz1952generalization} (also called IPW or IS estimator).
When all treatment decisions are independent Bernoulli variables with probability $p_i$, the Horvitz Thompson (HT) estimator as follows:


% \begin{align*}
% \frac{1}{n} \sum_{i=1}^n  Y_i \left(\frac{\mathbb{I}(\bz \textrm{ treats all of } \mathcal{N}_i)}{\Pr(\bz \textrm{ treats all of } \mathcal{N}_i)} - \frac{\mathbb{I}(\bz \textrm{ does not treat any of } \mathcal{N}_i)}{\Pr(\bz \textrm{ does not treat any of } \mathcal{N}_i)}\right).
% \end{align*}

% In the case when all treatment decisions are independent Bernoulli variables, this can be written as:

\begin{align}
\label{eq:tau_ht}
\tau_{\text{HT}}  &=  \frac{1}{n} \sum_i Y_i \left(  \frac{\prod_{j \in \mathcal{N}_i} z_j}{ \prod_{j \in \mathcal{N}_i } p_j} - \frac{\prod_{j \in \mathcal{N}_i} (1-z_j)}{ \prod_{j \in \mathcal{N}_i } (1-p_j)} \right) \nonumber \\
 &= 
 \frac{1}{n} \sum_i Y_i \left(  \prod_{j \in \mathcal{N}_i} \frac{z_j}{p_j} - \prod_{j \in \mathcal{N}_i} \frac{(1-z_j)}{(1-p_j)} \right)
\end{align} 
 

%A similar formula exists for the Hajek style estimator with the denominators $\prod_{j \in \mathcal{N}_i } p$ and $\prod_{j \in \mathcal{N}_i } (1-p)$, replaced by their self normalized values.

If the network is fixed the IPW estimate (and its variants) do not require any further assumption other than randomization and positivity. Unfortunately, when the network is dependent on the treatment vector Z, the HT estimator is not unbiased.

For example, consider 3 node graph with nodes L, R, and U. Each node is a binary treatment node (can be only 0 or 1) (shown in Figure \ref{fig:counter} )

   \begin{figure}[htp]
    \centering
    \begin{tikzpicture}[
      scale=0.1,
      node distance=1cm and 0cm,
      observed_node/.style={minimum size=1cm,fill=lightgray,text=black,draw=black,circle,text width=0.5cm,align=center},
      deterministic_observed_node/.style={minimum size=1cm,fill=lightgray,text=black,draw=black,circle,text width=0.5cm,align=center, double=none, double distance=1pt, even odd rule},
      hidden_node/.style={minimum size=1cm,fill=white,text=black,draw=black,circle,text width=0.5cm,align=center},
      deterministic_hidden_node/.style={minimum size=1cm,fill=white,text=black,draw=black,circle,text width=0.5cm,align=center, double, double distance=1pt},
      text_only_node/.style={minimum size=0.001cm,fill=white,text=black,draw=white,circle,text width=0.05cm,align=center},
    ]
   %% \node[hidden_node] at (30,-10*0.866) (E) {$E$};
   %% \node[observed_node] at (10,-20*0.866)  (E_star) {$\tilde{E}$};
   %% \node[observed_node] at (15,0)  (Y) {$Y$};
   %% \node[observed_node] at (45,0) (X) {$X$};
   %% \node[observed_node] at (50,-20*0.866) (Z) {$Z$};
   
    \node[hidden_node] at (30,0.866) (U) {$U$};
    \node[hidden_node] at (15,-30*0.866)  (L) {$L$};
    \node[hidden_node] at (45,-30*0.866) (R) {$R$};
    \path %(Z) edge[-latex] (X)
    %(X) edge[-latex] (E)
    (U) edge[-latex] (L)
    (U) edge[-latex] (R);
    \end{tikzpicture}
    \captionof{figure}{Counterexample demonstrating bias of the standard HT estimate. The figure shows two edges one between U and L, and another between U and R. However, these are potential edges and when treatment allocation happens, only one of the edges will be observed while the other will vanish. The shifting of the edge between counterfactuals causes the bias in HT estimate.}
    \label{fig:counter}
\end{figure}

Edge UL exists if  and only if $Z_U=1$ otherwise the edge UR will exist. However outcomes at L and R, i.e. $(Y_L,Y_R)$ respectively are independent of treatment at $U$ and only depend on treatment at self with the effect being constant $\alpha$ i.e. the outcomes are $Y_{L/R}(1) = Y_{L/R}(0) + \alpha_{L/R}$. All treatments are randomized with probability $q=0.5$.


We consider the total treatment effect (TTE) or global average treatment effect (GATE) between $Z=\vec{0}$ and $Z=\vec{1}$ with the HT estimate here. By symmetry we can consider only $U,L$ with the $U,R$ case analogous.
Consider the standard HT estimator: we have 4 possibilities for the relevant treatments each with probability 0.25. When $Z_U=0$, the observed network and counterfactual network is the same; and hence the value of the $\tau_{HT}$ is unbiased ( = $Y_L(1) - Y_L(0)$. However when $Z_U=1$, the HT estimator takes into account the edge $UL$. Thus when $Z_U=1,Z_L=0$, the propensity terms in the estimator zero out, leading to 0 value. Thus the expected value of the HT estimator from node $L$ over all treatment allocations is given by  $( Y_L(1) + \frac{Y_L(1) - Y_L(0)}{2})$. \footnote{More detailed case analysis is in the Appendix}. Similarly, the contribution from node $R$ is $( -Y_R(0) + \frac{Y_R(1) - Y_R(0)}{2})$. 

Hence, the expected value of the estimator for all nodes together is given by $\frac{Y_L(1) - Y_R(0)}{2} + \frac{\alpha_L + \alpha_R}{4}$. On the other hand, the true treatment effect is, the mean of $Y(1) - Y(0)$ over all nodes i.e. $(\alpha_L + \alpha_R)/2$. Thus, we can see that the HT estimator is biased.

%If we consider TTE (GATE) between $Z=\vec{0}$ and $Z=\vec{1}$  and use the HT formula, the $\E[\tau_{HT}] = Y_L-Y_R$  the actual TE in this case is 0.


The problem arose because if one does not observe the edge between the nodes (L/R) and U, the HT estimator does not include it in the inverse probability weights (since they are dependent on the network) ratio. And between the 2 possibilities the weight ratio moved from L to R (because the edge moved from L to R) in the HT estimate, which caused the bias.  We discuss more formally the issue with HT estimation in the Appendix. 

%If we also weighed the results of L for Z=0 ( and R for Z=1), then it would have been unbiased.





%\section{Method}

\textbf{Outcome Model (Additive Interference):}
%$$Y_i(\bz)= b_{i} + \sum c_{ij} E_{i,j}(z_i,z_j)z_j$$

$$Y_i(Z)= b_i + c_{ii} Z_i + \sum c_{ij} \tilde{Z}_{ij}$$

where $b_{i}$ is the baseline effect, $c_{ii}$ is the direct effect of treatment, $\tilde{Z}_{ij}$ refers to individual factors arising from the treatment vector, and $c_{i,j}$ is the influence of factor $j$ on node $i$. In the case of standard linear network interference $\tilde{Z}_{ij} = Z_j$. Higher order network dependence can also be modeled here by having multiplicative interaction terms between the components of  $Z$, but for this paper we will focus on the linear case.

\begin{tcolorbox}[enhanced,colback=white,colframe=black, coltitle=white, center title,title=Linear Additive Interference]
\vspace{-3ex}
\begin{align*}
\forall i, Y_i(\bz)= b_i + c_{ii} z_i + \sum c_{ij} A_{ij} z_j \label{ass:add} \tag{\textbf{A4}}    
\end{align*}
\end{tcolorbox}

\begin{remark} We have not yet assumed anything about $c_{i,j}$, and thus our method supports heterogeneous effects.
\end{remark}
\begin{remark}
    The presented counter-example presented earlier does satisfy an additive interference. Thus this specific assumption is not enough to solve the problem.
\end{remark}
%The outcome model can overall be written as:




%We are considering the case where the edge $E_{i,j}$ depends on the treatments $Z_i,Z_j$

%We can write the outcome as a vector regression equation $Y_i = c_{ii} + \tilde{Z}^T c_i$  where we have stacked the coefficients $c_{ij}$ into a single vector.




The GATE is defined as:
$\tau = \E[ Y_i(\vec{1})] - \E[Y_i(\vec{0})] $,
where $\vec{1}$ and $\vec{0}$ represent the 
all 1 (all treated) and all 0 (all untreated)   treatment vectors. Substituting this in the outcome model we get
$$\tau_i = c_{ii} + \sum_j c_{ij} \E[A_{ij}|\bz =\vec{1}] $$
%$$ \tau = \frac{1}{n}\sum \tau_i$$

\section{Estimation}
In this section we first present a general matrix representation framework to estimate the treatment effect $\tau$ based on matrix pseudoinverses. We then show how this design fails in the treatment dependent network case, because of a hidden endogeneity. We next discuss how this suggests a solution to the problem by introducing instrument variables.

\subsection{Matrix Representation}

The discussion in this section follows the presentation of \citet{a7}
Let $\N_i$ be the \emph{fixed} set of neighbours of a specific ego node $i$.  Consider a hypothetical scenario, where we observe a collection of \(r\) experiments, each time conducted with a different vector $Z$. Let \(Y_i^r\) be the observed outcome at node \(i\) in the \(r\)-th trial. Under the linear-additive assumption, we can write:

\[
Y_i^r \;= b_i + \; c_{ii} \;+\; \bigl(Z_{\N(i)}^r\bigr)^\top c_{i},
\]

where \(Z_{\N(i)}^r\) is the vector of treatments corresponding to the neighbors of \(i\) (or nodes from which \(i\) receives interference) in trial \(r\), \(c_{ii}\) is the direct effect of treating \(i\), and \(c_i\) is the vector of marginal effects of each neighbor’s treatment on \(i\).
We can formally express the variables from these hypothetical trials as in matrix form as follows:

\[
\underbrace{\begin{bmatrix}
    Y_i^1 \\
    Y_i^2 \\
    \vdots \\
    Y_i^r 
\end{bmatrix}}_{r\times 1}
\;=\;
\underbrace{\begin{bmatrix}
    1 &  \bigl(Z_{\N(i)}^1\bigr)^\top \\
    1 &  \bigl(Z_{\N(i)}^2\bigr)^\top \\
    \vdots & \vdots \\
    1 &  \bigl(Z_{\N(i)}^r\bigr)^\top
\end{bmatrix}}_{r\times d}
\underbrace{\begin{bmatrix}
b_i \\
    c_{ii} \\
    \vec{c}_i
\end{bmatrix}}_{d\times 1}
\quad \Rightarrow \quad
\bY_i \;=\; \bZ_i \,\bc_i.
\]

Here, \(d\) is the dimension of the parameter vector $\bc_i$, which includes the direct treatment effect \(c_{ii}\) and the vector of neighbor-treatment effects \(c_i\). If we have results from many such random assignments of
$\bm z$ make the least square estimator unbiased for
$\bm c$. 


\subsection{Treatment \emph{dependent} graph}:

Now in our scenario, where the network edges depended on treatment allocation, the network structure may change from trial to trial. Consequently, for each experiment \(r\), the set of neighbors \(\N(i)\) can vary, leading to different observed components in \(Z_{\N(i)}^r\).

\begin{comment}
\[
\bm Y \;=\; Z\bm c, 
\quad 
X := D\odot Z,
\]
with $D=[d_{ij}]$ and $Z=\bm 1\,\bm z^{\!\top}$; ``$\odot$'' is the Hadamard
product.  Because $D$ and 
$Z$ are independent
\end{comment}

Hence, we need to modify the previous approach to include the the variables $A_{ij}$. We consider the situation in which the node $j$ has an effect on $i$ depends only on $z_j$, that is, $A_{ij}(\mathbf{z}) \;=\; A_{ij}(z_j)$. 


The structural equation becomes
\begin{equation}
\label{eq:struct}
Y_i \;=\; \sum_{j=1}^{n} A_{ij}(Z_j)\,c_{ij}\,Z_j
          \;+\; c_i Z_i .
\end{equation}
Define the \emph{ideal} (but unobserved) regressors
\(
X_{ij} := A_{ij}(Z_j)\,Z_j.
\)
we have the relation $\bY_i \;=\; \bX_i \,\bc_i$ 
%.express the outcomes \(\mathbf{Y}_i\) and the regressor matrix \(\mathbf{Z}_i\) as follows:
Once again if we have sufficient number of trials this can be estimated, however that is not feasible in a standard RCT.

With limited number of trials, one cannot observe all the network configurations. Instead one uses the $Z$ based on the network observed in the trial, but the corresponding design matrix ignores the 'counterfactual' edges under alternate treatment allocation. 








For simplicity consider the network as obtained from a single trial with the treatment allocation being $Z^1$. If we naively regress using $Z_j$ from the observed network, then we have
\[
X_{ij}= A_{ij}(Z^1_j) Z_j + q_{ij},
\quad
q_{ij}:=(A_{ij}(Z_j)- A_{ij}(Z^1_j))\,Z_j .
\]
Hence the observed design matrix is $W = Z = X - Q$ with
$Q=[q_{ij}]$, and~\eqref{eq:struct} can be rewritten as
\[
Y_i
=\sum_{j} c_{ij}Z_j
\;+\;
\underbrace{\sum_{j} c_{ij}(A_{ij}(Z_j)-A_{ij}(Z^1_j))Z_j}_{\varepsilon_i}.
\]
Because $\varepsilon_i$ contains functions of $Z_j$,
\[
\mathbb{E}\!\left[W^\top\! \bm\varepsilon\right]
=\mathbb{E}\bigl[Z_j\,c_{ij}(A_{ij}(Z_j)-A_{ij}(Z^1_j))Z_j\bigr]\neq 0 .
\]
Thus the standard regression assumption of \emph{orthogonality fails}:
$Z_j$ is correlated with the regression error, just as in the
standard error--in--variables or endogenous regressor problem.
Thus if we attempt to apply ``static'' network interference methods (which assume a fixed set of neighbors and fully observed edges), we end up effectively estimating a regression with an endogenous error term \citep{sargan1958estimation,bowden1990instrumental}.


The presence of the unobserved or ``missing'' edges shifts part of the structure into an unobserved confounding term, rendering a naive regression approach potentially biased.
As detailed, this is reminiscent of \emph{endogenous error} encountered in classical econometrics: the missing (or unobserved) regressors are subsumed into the error term, potentially violating standard exogeneity assumptions. 
This connection also hints at a solution: the standard method to address endogeneity in econometrics is to use \emph{instrumental variables (IV)}. We propose a similar approach of using IVs. In the next section, we illustrate how IV based methods yield consistent estimates of the treatment (and spillover) effects despite partial observation of the complete network structure.


 






\subsection{IV based Estimation}

Suppose we have access to mean zero instrumental variables \(V\).
From the outcome model
\[
Y_i \;=b_i + \; c_{ii} \;+\; (Z_{\N(i)})^\top c_i,
\]
we multiply both sides by \(V\) and take expectations:

\[
\E[VY_i] 
\;=\; 
\E[V],c_{ii} + \E[V\,(Z_{\N(i)})^\top] c_i.
\]
Since \(\mathbb{E}[V] = 0\), the term \(\mathbb{E}[V\,b_{i}]\) vanishes. Solving for \(c_i\) yields:

\[
c_i 
\;=\;
\Bigl(\mathbb{E}\bigl[V\,(Z_{\N(i)})^\top\bigr]\Bigr)^{-1} 
\;\mathbb{E}\bigl[V\,Y_i\bigr].
\]

Hence, provided \(\mathbb{E}[V\,(Z_{\N(i)})^\top]\) is invertible, we can recover \(c_i\) consistently by using this moment equation.

\paragraph{Single-Sample Estimation}

\begin{comment}
Consider a hypothetical scenario where the same experiment is repeated multiple times each with a different $Z$ vector. For simplicity consider the case when $\tilde{Z}_{ij} = Z_j$ and $A_{ij}(\bz) =. A_{ij}(z_j) $. That is a case with linear additive interference structure, and where the influence from node j to i only depends on the source node allocation $z_j$. We can write the following matrix-like form:

 $$
 \underbrace{\begin{bmatrix}
    Y_i^1 \\
    Y_i^2 \\
    \vdots \\
    Y_i^r \\
\end{bmatrix}}_{r\times1}
= 
  \underbrace{ \begin{bmatrix}
    1 &  Z^1_{N{i}} \\
    1 &  Z^2_{N{i}} \\ 
    \vdots \\
    1 &  Z^r_{N{i}}) \\
\end{bmatrix}}_{r\times d}
 \underbrace{\begin{bmatrix}
 c_{ii} \\
 c_i
\end{bmatrix}}_{d\times 1}
\Rightarrow \bY_i = \bZ_i^T \bc_i 
$$
In our treatment dependent network scenario, the number of observed components of $(Z^1_{N{i}})$ can change from trial to trial. Hence the above equation needs to be considered as a hypothetical matrix for which we will only observe some of the columns and a single row.
But even then, we can see a clear difference from the scenario when all edges are static. When all edges are static, we will observe all the columns of the $\bZ$ matrix. On the other hand we do not observe the missing edges, we are merging the missing edges into the error/noise term. Furthermore when $Z$ changes, a different set of terms become observed. This is very similar to the case of endogenous error often found in many different applications. Thus methods developed for applying static network interference methods, when applied in this scenario, are trying to estimate the regression model in an endogenous error scenario \citep{sargan1958estimation,bowden1990instrumental}. . 
However motivated from this insight we next present a way to use instrument variables to estimate the treatment effect.

\section{Estimation via Instrumental Variables}
Assume access to instrumental variables $V$ with:
$\E[V]=0$ (mean zero).


From the outcome model:
$Y_i = c_{ii} + (Z_{N{i}})^T c_{i} $

Multiply by $V$ and take expectations:
\begin{align*}
&V Y_i  = Vc_{ii} + V(Z_{N{i}})^T c_{i} \\
&\E[V Y_i]  = \E[V]c_{ii} + \E[V(Z_{N{i}})^T] c_{i}
\end{align*}
 
Solving for $c_i$ we get:
$$c_i = \E[V(Z_{N{i}})^T]^{-1} \E[VY_i] $$


\subsection{Single-Experiment Estimation}
While the above equation holds, for expectations one can obtain consistent estimators for $c_i$ by using the sample analog:
$$\hat{c}_i = [ \sum \frac{1}{R}V^r (Z^r_{N{i}})^T]^{-1} [ \frac{1}{R} \sum V^rY^r_i] $$
\end{comment}
While the above equation holds for expected values,  one can obtain consistent estimators by using sample version. Suppose we run \(R\) experiments indexed by \(r\), observe \(\{V^r, Z_{\N(i)}^r, Y_i^r\}\), and form:

\[
\hat{c}^R_i 
\;=\; 
\Bigl[\tfrac{1}{R}\,\sum_{r=1}^R V^r\,\bigl(Z_{\N(i)}^r\bigr)^\top\Bigr]^{-1}
\Bigl[\tfrac{1}{R}\,\sum_{r=1}^R V^r\,Y_i^r\Bigr].
\]

By construction, \(\hat{c}_i\) is a consistent estimator of \(c_i\). Moreover, if the matrix \(\mathbb{E}[V\,(Z_{\N(i)})^\top]\) is known (or can be computed from external information), then even a single experiment \(r\) could suffice. In that scenario,

\[
\hat{c}_i 
\;=\;
\Bigl(\mathbb{E}\bigl[V\,(Z_{\N(i)})^\top\bigr]\Bigr)^{-1} \, \bigl(V\,Y_i\bigr),
\]

and since \(V\,Y_i\) is an unbiased estimate of \(\mathbb{E}[V\,Y_i]\), \(\hat{c}_i\) remains unbiased.

%Moreover, if the matrix $\E[V(Z_{N{i}})^T]$ is known or computable, then one can get unbiased estimates even with a single experiment. To do is one can compute the inverse matrix explicitly, and use single sample estimate for $\E[V^rY^r_i]$. Since we have a linear transformation of an unbiased sample, the estimator itself is unbiased.


In the above argument, the matrix $\E[V(Z_{\N{i}})^T]$  was considered invertible. However this in general will not be the case. For a non-invertible matrix one can use the Moore-Penrose pseudo-inverse. 
If $\E[V(Z_{N{i}})^T]$ has full column rank, the estimates remain unbiased. Thus we have the following estimator
\begin{align}
    \hat{c}_i =  \E[V (Z_{\N{i}})^T]^{+} [ \sum V^rY^r_i] \label{eq:piest} 
\end{align}


\subsubsection{Identification Condition}
For identification,we require the following conditions
\begin{itemize}
\item \textbf{Relevance}:  \(V\) is correlated with \(Z_{\N_i}\), 
\item \textbf{Exclusion}: \(V\) affects \(Y_i\) only through \(Z_{N_i}\).
\end{itemize}
Both of these conditions are natural in the standard IV literature \citep{angrist1996identification,sargan1958estimation,bowden1990instrumental,bonet2013instrumentality}. Relevance ensures that $V$ captures enough variation in $Z$ to ensure  $\E[V(Z_{\N{i}})^T]$ is non singular. Exclusion ensures that $VY_i$ does not have any systematic $Z$ dependent component.

A common instrument in network settings is the treatment of neighbours \citep{drago2020compliance,rogowski2012estimating}. In our setting also, these variables can serve as valid instrument variables \citep{rogowski2012estimating}. Specifically, we will use for each node $j$ we can create an instrument $V_j = \frac{Z_j}{p} - \frac{(1-Z_j)}{(1-p)}$. By construction, $V_j$ it is correlated with the $Z_{\N(i)}$ if $j \in \N(i)$, thus satisfying relevance. However, exclusion is not always satisfied, specifically if $j$ appears in $\N(i)$ for one allocation but not in a different one.
Next, we describe detail two specific methods leveraging the aforementioned idea of IV based pseudoinverse estimator, by using two different constructions of neighbourhood based IVs.
%\footnote{We describe a third method with multiple measurements in the Appendix. However since it is effectively the same estimator as that in \citep{YuCortezEichhorn22}, we do not analyze }.

\subsection{Estimators}

\subsubsection{Overcomplete Estimator}
\label{sec:OIV}
Consider the scenario, when for each node $i$ we know a superset of all possible neighbours under all possible treatment allocations. Lets denote this set as $\mathcal{M}_i$. 
\begin{tcolorbox}[enhanced,colback=white,colframe=black, coltitle=white, center title]
\vspace{-3ex}
\begin{align*}
\text{Neighbourhood Superset:} \;\; \mathcal{M}_i \supseteq \mathcal{N}_i(\bz) \; \forall i,\bz\label{ass:superset} \tag{\textbf{A5}}    
\end{align*}
\end{tcolorbox}

In such a case, the treatment of all units in $\mathcal{M}$ provides an overcomplete set of instruments.


The estimator is present in \cref{eq:te_oiv}In this setting, the GATE estimator becomes the estimator of \citet{sussman2017elements}, which itself can be seen as a version of the standard pseudo-inverse estimator  \citep{swaminathan2017off,a7}.
\begin{align}
\label{eq:te_oiv}
\hat{\tau}_{\text{OIV}} = \frac{1}{n} \sum_i Y_i  \sum_{j \in \mathcal{M}_i} \left(  \frac{z_j}{p} - \frac{(1-z_j)}{(1-p)}  \right).
\end{align}
The derivation of the above estimator from \cref{eq:piest} is in the Appendix (\cref{lem:pinverse}).

\begin{prop}
Under assumptions \textbf{A1-4,A5} , $\hat{\tau}_{OIV}$ is an unbiased estimate of the treatment effect $\tau$
\end{prop}


\begin{remark}
    While Assumption \ref{ass:superset} can be a strong assumption, in many scenarios this can be satisfied. As a simple example, consider all nodes which share a geographic location ( or in case of units being mobile devices, IP). This is
very likely to be a superset of all interactions this unit can have. In other cases, user modeling and device-linking methods are used to identify neighbours based on confidence scores i.e.
they have a probabilistic version of the adjacency matrix $\bA$.
Such a method can usually be adapted to obtain a superset
of neighbours with high probability ( by including even low
confidence nodes as neighbours.
\end{remark}





We now turn to the case when we do not have enough IVs. For the linear case we would have required as many instruments as nodes. This along with the relevance criteria can be hard to satisfy, and so a method which works with fewer instruments is more valuable for some applications.




%Next we discuss how to address the question of not having enough instruments.


%Before we go further into estimating this quantity, we point two things: a) Under some assumptions, we may not need any adjustment. 

\subsubsection{Undercomplete Estimator} 
\label{sec:UIV}
 In this section we consider the case of undercomplete $V$. As earlier the treatment of neighbouring nodes are used to create the instrument. However, the set of observed neighbours do not qualify as valid instruments \footnote{Using only the observed neighbours is the same as assuming static interference, which as shown earlier leads to biased estimation}. 
  The method from the previous section used a superset $\mathcal{M}_i$ of all possible neighbours; or equivalently a set which is the union of all the neighbouring sets under all possible treatments.

  Now we present an alternative which instead relies on the intersection of all the neighbouring sets under all possible treatments. Equivalently consider the set of edges $j\rightarrow i$ such that $A_{ij}(\bZ)$ is a constant function independent of $Z$. 

These set of edges will continue to exist regardless of treatment assignments, and thus we call them conserved edges. Let use denote such a set of edges as $\mathcal{M}^c_i$
The knowledge of a large enough set of pre-experiment edges that are conserved, allows us to circumvent the difficulties posed by not observing edges under counterfactual treatments. 

\begin{tcolorbox}[enhanced,colback=white,colframe=black, coltitle=white, center title]
\vspace{-3ex}
\begin{align*}
\text{Conserved Set:} \;\; \mathcal{M}^c_i \subseteq \mathcal{N}_i(\bz) \; \forall i,\bz\label{ass:subset} \tag{\textbf{A6}}    
\end{align*}
\end{tcolorbox}

\begin{remark}
The existence of such a conserved edges is analogous to the classical “compliance” assumption used for  instrumental variables estimation\citep{angrist1996identification}.
\end{remark}

%which in this setting would amount to requiring that all pre-treatment edges exist after the treatment has been applied.


We propose to use the IV pseudo-inverse estimator \ref{eq:piest}, but will adjust the estimate obtained, by noting that it only covers a subset of the variables. Such a set is almost always by construction undercomplete.  
However we also note that we do not need the entire vector $c_{i}$. Instead we care only about the total treatment effect which is $c_i^T\vec{1}$. Under certain  assumptions, the estimate obtained by using the undercomplete pseudo-inverse can be adjusted to be unbiased.

One such assumption is the assumption of homogenous neighbours (\textbf{A7}). Under this assumption $c_{ij}$ does not depend on $j$. Hence, this is also called anonymous interference as the effect does not depend on the identity of the neighour.

\begin{tcolorbox}[enhanced,colback=white,colframe=black, coltitle=white, center title]
\vspace{-3ex}
\begin{align*}
\text{Anonymous Interference:} \;\; c_{ij} = c_{ij'}   \forall j,j' \in \mathcal{N}_i\setminus i\label{ass:anon} \tag{\textbf{A7}}    
\end{align*}
\end{tcolorbox}
\begin{remark}
$c_{ij}$ can still depend on $i$, so we still have some heterogeneity.
\end{remark}




Let $C_i = \frac{1}{p}\sum_j   Z_j  A_{ij}$, then \begin{align}\E[C_i] = \sum_j \frac{1}{p} \E[Z_j A_{ij}] = \sum_j \E[A_{ij}|Z_j=1]
\label{eqn:insight}
\end{align}

One key result that (see the Appendix) is that, if we use $\mathcal{M}^c_i$ as the instrument, the pseudo-inverse provides an unbiased estimate of the indirect effect of nodes in $\mathcal{M}^c_i$. That is we have $\E[\sum{\hat{c_i}}] = \sum_{j \in \mathcal{M}^c_i} c_{ij} \E[A_{ij}(1)]$ which under anonymity is just $c_{i} \sum_{j \in \mathcal{M}^c_i}\E[A_{ij}(1)]  $ which further under conserved edges becomes $c_{i} |\mathcal{M}^c_i|$. Thus we can rescale this estimate by $C_i$ to get an unbiased estimate of $\tau_i$.

\begin{align}
\label{eq:te_full}
\hat{\tau}_{\text{UIV}} &= \frac{1}{n} \sum_i Y_i \biggl[ \left(\frac{z_i}{p} - \frac{1-z_i}{1-p}\right) + \\
&\;\; \sum_{j \in \mathcal{M}^c_i} \left(  \frac{z_j}{p} - \frac{(1-z_j)}{(1-p)} \right)\left(\dfrac{\sum_j   z_j}{p|\mathcal{M}^c_i|}  \right)\biggr].
\end{align}

\begin{prop}
Under assumptions \textbf{A1-4,A6-7} , $\hat{\tau}_{UIV}$ is an unbiased estimate of the treatment effect $\tau$
\end{prop}

We would like to bring a crucial detail to the attention of the reader. As mentioned before $\hat{\tau}_{OIV}$ is very similar to the HATE estimator of \citet{sussman2017elements}. Similarly $\hat{\tau}_{UIV}$ is a scaled version of the same estimator. The critical difference between them lies in the set of neighbours used. This is because under treatment dependent networks, the neighbourhood itself also becomes a function of treatment, and using the observed neighbourhood will cause errors. How $\hat{\tau}_{OIV}$,$\hat{\tau}_{UIV}$ specifically handle this is discussed in more detail in Appendix A.2.
    

\begin{remark}
    In Appendix A.1, we derive bounds for the variance of the UIV and OIV estimator which can be used to provide conservative intervals for a Wald-style hypothesis test \citep{wasserman2006all}.
\end{remark}




\begin{remark}
We present another estimator based on the insight from (\cref{eqn:insight}) in the Appendix. 
This estimator, while efficient and with quite low variance, requires multiple independent trials.  Due to these conditions, this estimator is not applicable for many real datasets where we conduct the experiment once. That said for certain applications, researchers have access to baseline results \citep{YuCortezEichhorn22} which can be used as a trial.
\end{remark}

%This estimator, while efficient and with quite low variance, requires multiple trials. Furthermore, it assumes independence between trials. For additional details on this estimator we refer the readers to \citet{YuCortezEichhorn22}.
%from the control algorithm can be obtained , where the 



\begin{figure*}[th!]
    \centering
    \begin{subfigure}[b]{0.25\textwidth}
    \centering
    \includegraphics[width=\textwidth]{figsn/er-ratio-deg2_graph_aware.pdf}
    \includegraphics[width=\textwidth]{figsn/SBM-ratio-deg2_graph_aware.pdf}
    \caption{Direct/indirect effects: r }  \label{fig:ratioER}
    \end{subfigure}
    ~
    \begin{subfigure}[b]{0.25\textwidth}
    \centering
    \includegraphics[width=\textwidth]{figsn/er-size-deg2_graph_aware.pdf}
    \includegraphics[width=\textwidth]{figsn/SBM-size-deg2_graph_aware.pdf}
    \caption{Population size: n}  \label{fig:sizeER}
    \end{subfigure}
    ~
    \begin{subfigure}[b]{0.25\textwidth}
    \centering
    \includegraphics[width=\textwidth]{figsn/er-eprob-deg2_graph_aware.pdf}
    \includegraphics[width=\textwidth]{figsn/SBM-eprob-deg2_graph_aware.pdf}
     \caption{Treatment dependence: e}  \label{fig:pER}
    \end{subfigure}
    \vspace{-5pt}
    \begin{subfigure}[b]{0.9\textwidth}
    \centering
    %[trim={left bottom right top},clip]
    \includegraphics[width=0.8\textwidth,trim={0 0.2cm 0 0.5cm},clip]{figsn/legend1-1.pdf}
    \end{subfigure}
    \vspace{-25pt}
    \caption{Plots visualizing the performance  of various GATE estimators under Bernoulli design on
Erdős-Rényi networks (first row) and SBM networks (second row). The lines represent the empirical relative bias, i.e., $\frac{\hat{\tau} - \tau}{\tau}$ of the estimators across different settings, with the shaded width corresponding to the experimental standard error.  \label{fig:erdos_combined}}
\end{figure*}
\section{Experiments}



\subsection{Synthetic Graphs}
In this section, we experimentally demonstrate the validity of our proposed methods by experimenting with synthetic data obtained from a model which satisfies our assumptions exactly. We experiment with both Erdős-Rényi (ER) graphs  and stochastic block model (SBM) graphs to compare the performance of our estimator with other estimators. We simulate 100 different random graphs and run repeated experiments on each graph with random treatment assignments. 
We set a independence parameter $e$ which determines the fraction of these edges which will not show a treatment dependent behaviour. Specifically each  treatment dependent edge acts as a bernoulli variable and will be activated if its source node has treatment 1. A subset of the non-varying neighbourhood is take as the conserved edges for  ($\mathcal{M}^c_i$). On the other hand the base network itself is taken to be the superset neighbourhood ($\mathcal{M}_i$). The potential outcomes $Y_i(\bz)$ are obtained by applying a  function $g$ on the  exposure and adding a mean zero noise. The exposure are computed using the procedure in \citet{YuCortezEichhorn22}. For each experiment, we varied the treatment probability $p$, the size of the graphs $n$ to assess the efficacy of estimation across different ranges of parameters and the strength of interference $r$. Similar to \citet{YuCortezEichhorn22} we measure the strength of interference $r$ as the ratio of norms of the self or direct influence and the indirect influence (more details in \cref{apx:synth}).
%$\phi_{i,j}$ i.e. $r = \frac{1}{n} \sum_i \frac{ \sum_{ j \in \N_i \setminus i} |\phi_{i,j}|}{|\phi_{i,i}| |\N_i|}$

%For the treatment allocation we employed a Bernoulli randomized design with probability of being allocated to treatment group as $p$.


%Using, we compare the performance of our estimator with existing estimators. Using an Erdős-Rényi model, we generate random directed graphs of $n$ nodes for a population of $n$ individuals.  










%\textbf{Baselines}
We gauge the effectiveness of $\mex$ by benchmarking it against commonly employed estimators such as polynomial regression (Poly), ReFeX \citep{han2023modelbased},  and the difference-in-means (DM) estimators ($\hat{\tau}_{\text{SUTVA}})$. %Except for the DM model, all other models need exact neighbourhoods, and so we use them in an oracle setting, i.e., they have access to the true graph. 
Due to the size of neighbourhoods, Horwitz-Thompson estimators failed to yield non-meaningful results in these trials.
%Therefore, we have opted not to present their outcomes in the synthetic experiment's results section.

The results are presented in Figure \ref{fig:erdos_combined}. The first row contains results from the ER model. From the figure it is clear that our model produces unbiased estimates. On the other hand, all other methods produce highly biased estimates. Note that in Figure \ref{fig:erdos_combined}a, when $r=0$, there is no interference, and hence most estimators are unbiased. However, when interference increases these methods clearly show strong bias. Secondly, for a given interference strength, our method shows consistency in the form of decreasing variance with increasing number of nodes. Finally we also show bias due to treatment dependence in all methods, while we remains unbiased. Similar results are obtained on the SBM model as well.
%\ssh{Additional experiments (to be run on other graphs models like SW, SBM)}

%\ina{It's remarkable that we do better than poly even though that has the device graph (oracle). you should explain the source of bias for this method (and for the others).}



%One natural answer for addressing this problem is to use neural networks to approximate the actual exposure function. This can be modeled in our scenario by modeling the full joint distribution via neural networks. Unfortunately, however variational-EM methods are not identifiable in general \citep{redner1984mixture,dwivedi2020sharp}.


 


\subsection{Application: Assessing Impact of Banking Access Intervention}

Next, we demonstrate an application of observational data. We focus on the application mentioned in the introduction, which introduces access to financial accounts. We use the data from the field study conducted by \citet{prina2015banking,comola2015treatment} in the region around Pokhara in Nepal. The experiment involves a randomized trial of providing access to
formal savings accounts to a random sample of poor households. The authors surveyed all poor households with an adult and working female head to identify their social connections. The initial social network was sparse and minimally clustered. Half of the families were offered access to a savings bank account. After the treatment, another survey was conducted with the families. \cite{comola2015treatment} have reported a significant fraction of treated units using the savings bank account. They also found a significant change in social connections, with around 50\% of the connections changing post-treatment. 
The outcomes $Y_i$ correspond to measured household consumption. Literature has shown strong peer effects for this variable \citep{cruwys2015social}. We used the intersection of the two networks as $\mathcal{M}^c$ for the UIV estimate and their union as $\mathcal{M}$ for the OIV estimate. 

As this is observational data, we do not know the ground truth effect and consider the results of \citet{comola2015treatment} as a reference. 
\cref{fig:nepal} shows that our method provides similar estimates as the the reference, but other interference aware methods like RefeX method, while better than no-interference model do not do as well.

%\ssh{I have another similar application with this dataset on health outcomes, should i add that} \ina{I don't know that we'll have room in the main paper; I'd only include it if the results are great. we can always do an MLHC paper for just that application}



\begin{figure}
% trim={<left> <lower> <right> <upper>}
\includegraphics[width=0.45\textwidth,trim={0 0 0 0.625cm},clip]{figsn/box_nepal.pdf}
\captionof{figure}{ Estimates for GATE of financial access on household consumption for the \citep{prina2015banking} experiment. The box plot depicts the mean and the 95\% confidence interval. HT and ReFeX methods use post-treatment neighbourhoods, and Ref is the method from \citet{comola2015treatment} \label{fig:nepal}}
\end{figure}






\section{Conclusion}
We presented a major limitation of current interference-aware GATE methods. We show that the standard HT estimate is biased when the interference network is treatment-dependent. We then provide two different solutions to this problem by combining the ideas of pseudoinverse estimation with the concept of instrumental variables. We show that our estimators are unbiased and provide a statistical inference method. Finally, we experiment with both real and synthetic data to show the validity of our estimators. Our results have immediate implications for randomized trials in social networks, public health, and economics, where ignoring endogenous interference can lead to severely misleading conclusions.

A limitation of our work is that the variance of the estimate grows with the size of the neighbourhoods, and so for practical applications, one needs to balance the risk of higher variance against potential bias. Future research directions include incorporating temporal data and longitudinal studies.
\bibliography{mybib}
\include{appendix}
\end{document}

