\section{Introduction}
% \caroline{todos to polish: 2. add a figure in expts section similar to Figure 1 but with MerrorKIV added 3. add a figure in expts section showing the fitted X's (?)}
% outline
% - causal inference is critical to understand real world processes
% - in many cases we can't observe the cause, only noisly measure it
% - most prior works ignore this, using the measurements as the cause
% - we propose a method for cause-effect estimation in the popular IV model, when the cause is unobserved
% - contributions

Real world data poses many problems for causal effect estimation. Unmeasured confounding, the existence of hidden common causes of a treatment $X$ and an outcome of interest $Y$, is a problem that lies at the heart of many applied sciences. Solving this problem led to a variety of approaches, the most common based on the idea of instrumental variables (IVs): an auxiliary variable $Z$ independent of $Y$ upon a perfect intervention on $X$ \citep{pearl2009causality,hernan2020}, which is predictive of $X$ but not caused by it.

A less commonly studied challenge is when \emph{the treatment is not directly observed}. For instance, we may want to learn the effect of taking a drug ($X = 1$) against not taking it ($X = 0$), where we incentivize the patients to take it or not ($Z = 1$ vs $Z = 0$). It is not necessarily the case that $X = Z$, because the patients do it at home instead of a hospital with supervision, and so they may not comply with the incentive. This \emph{non-compliance} problem is compounded with the \emph{measurement error} problem: a self-reported measurement of taking ($M = 1$) or not taking ($M = 0$) the drug does not imply $X = M$, because the patient may be lying or just forgetful. An instrumental variable approach to estimate an \emph{average treatment effect} (ATE) such as $\mathbb E[Y~|~do(X = 1)] - \mathbb E[Y~|~do(X = 0)]$ \citep{pearl2009causality} may fail to give reliable results if our data consists of records of $(Z, M, Y)$, but the assumption $X = M$ does not hold.

\begin{figure}[t]
    \centering
    \includegraphics[scale=0.5]{figures/results_fig.pdf}
    \caption{Comparison of curves fitted by our method and KIVM under a corrupted treatment measurement $X$ against with true curve. KIVM is a method we will discuss in the sequel, which ignores that measurements of $X$ are corrupted by additive noise.}
    \label{fig:intro_comparison}
\end{figure}

A related issue happens when postulating latent \emph{constructs} as causes. In a widespread example by \citep{bollen1989}, a model for the effects of ``industrialization level'' of a country in its political freedom $Y$ is considered. We may operationalize this construct by postulating a space of possible interventions $Z$ on industrialization $X$ that keep the relation between $X$ and $Y$ invariant. However, it remains the case that $X$ is not directly observable but for indirect measurements $M$, such as the GDP or the proportion of labor force working in industry.
%Examples in social sciences such as health records and examples in econometrics such as GDP as a measure for economic health are cases of latent quantities being measured with error. 

Of relevance, in both classes of indirect treatment measurement problem, is that the causal relation $\mathbb E[Y~|~do(x)]$ is considered to be fundamental, with $\mathbb E[Y~|~do(m)]$ being either zero, or poorly defined, or of secondary interest (for instance, \emph{redefining} GDP may as well have a genuine causal impact, but this intervention is not the motivation behind understanding the causal impact of industrialization levels). In particular, measurement mechanisms may change more easily than the relation between the putative cause and the outcome of interest (we may redefine GDP, or collect data where the phrasing and timing of our questioning of a patient's compliance varies in different communities, while assuming that the relation between $X$ and $Y$ is invariant). In a way, this measurement problem is a counterpart to why estimating intention-to-treat effects, i.e. $\mathbb E[Y~|~do(z)]$, is not in many cases the goal of an IV analysis, despite the policy-making implications.

The need to understand effects of the mismeasured quantities on other quantities of interest motivates the study of measurement error modeling \citep{carroll2006measurement_nonlinear,schennach_review,hernan2020}. Famously, even in the linear (noncausal) regression case, na\"ively regressing $Y$ on a noisy measurement of $X$ results in \emph{attenuation error}, which essentially means that the regression coefficient will be underestimated due to the measurement error \citep{carroll2006measurement_nonlinear}. An analogous phenomenon will take place when estimating causal effects. Figure \ref{fig:intro_comparison} depicts a kernel method that attempts to estimate a $X$-$Y$ dose-response curve, ignoring measurement error in $X$, compared against the curve found by the method we propose.

The nonlinear and confounded setting is an open domain to be explored. \cite{schennach_review} suggests that, in general, three measurements are needed to identify the full joint distribution of the measurements and the latent variable. However, in cases where we can make some assumptions on the error distribution, this can be reduced. Furthermore, we are not interested in the full joint distribution with the latent variable $X$, but only the parts which we need as components of the IV regression model. To that effect, we will assume that our problem follows the Markov properties of Figure \ref{fig:merror_iv_graph}: we are interested in the structural function $f(x) \equiv \mathbb E[Y~|~do(x)]$, where observationally $Y = f(x) + \epsilon$, the error term $\epsilon$ being correlated with treatment $X$. We assume that we have access to at least two treatment measurements, $M$ and $N$, and an instrumental variable $Z$.

\begin{figure}[t]
    \centering
    \begin{tikzpicture}[roundnode/.style={circle, draw=black!100, fill=black!15, very thick, minimum size=0.5mm}, normalnode/.style={circle, draw=black!100, fill=black!0, very thick, minimum size=0.5mm},
    squarednode/.style={rectangle, draw=red!60, fill=red!5, very thick, minimum size=5mm},]
    \node[roundnode] (x) at (0,0) {$X$};
    \node[normalnode] (y) at (2,0) {$Y$};
    \node[normalnode] (z) at (-2,0) {$Z$};
    \node[roundnode] (eps) at (1,2) {$\epsilon$};
    \node[normalnode] (m) at (-1, -2) {$M$};
    \node[normalnode] (n) at (1, -2) {$N$};
    
    \draw[-{Triangle[length=3mm, width=2mm]},line width=1pt] (x) -- (y) node[above, midway] {$f$};
    \draw[-{Triangle[length=3mm, width=2mm]},line width=1pt] (z) -- (x);
    \draw[-{Triangle[length=3mm, width=2mm]},line width=1pt] (eps) -- (y);
    \draw[-{Triangle[length=3mm, width=2mm]},line width=1pt] (eps) -- (x);
    \draw[-{Triangle[length=3mm, width=2mm]},line width=1pt] (x) -- (m);
    \draw[-{Triangle[length=3mm, width=2mm]},line width=1pt] (x) -- (n);
    \end{tikzpicture}

    \caption{An instrumental variable model with confounded treatment $X$ and $Y$, where the treatment is unobservable, but has indirect measurements $M$ and $N$.}
    \label{fig:merror_iv_graph}
\end{figure}

Our contribution is threefold:

\begin{itemize}
    \item we propose an estimator for the structural function $f(x)$ without requiring latent variable modeling. The resulting method can be applied without restrictive assumptions in the likelihood, such as the requirement for Gaussian error terms;
    \item in particular, we provide a method to learn the conditional mean embedding \citep{Muandet17:KME} of a latent variable distribution, which can be applied to many two-stage IV settings;
    \item we propose a way to exploit the connection between characteristic function methods and kernel methods, which may be applied to many settings outside of measurement error modelling (see Section~\ref{subsec: kme_cf}).
\end{itemize}

%{\bf TODO: the usual blurb of ``Section this, Section that''.}

