\section{Experiments}
\begin{figure*}[t]
\centering
\includegraphics[width=\textwidth]{figures/new_measurement_error_linear_sigmoid_30runs-1.png}
% \begin{minipage}[t]{.24\textwidth}
% \centering
% \includegraphics[scale=0.32]{figures/Linear_MixtureGaussian.pdf}
% \caption*{(a) Linear - Gaussian}
% \end{minipage}
% \begin{minipage}[t]{.245\textwidth}
% \centering
% \includegraphics[scale=0.32]{figures/Linear_Gaussian.pdf}
% \caption*{(b) Linear - MoG}
% \end{minipage}
% \begin{minipage}[t]{.245\textwidth}
% \centering
% \includegraphics[scale=0.32]{figures/Sigmoid_MixtureGaussian.pdf}
% \caption*{(c) Sigmoid - Gaussian}
% \end{minipage}
% \begin{minipage}[t]{.245\textwidth}
% \centering
% \includegraphics[scale=0.32]{figures/Sigmoid_Gaussian.pdf}
% \caption*{(d) Sigmoid - MoG}
% \end{minipage}
% \begin{minipage}[t]{.1\textwidth}
% \centering
% \includegraphics[scale=0.4]{figures/legend.pdf}
% \end{minipage}
% \caption{Step 2 for training MerrorKIV - finding the $X1$ and $\lambda_{X}^{(s_1)}$}
\caption{Out of sample MSEs ($\log_{10}$) for linear and sigmoid designs.}
\label{fig:linear_sigmoid}
\end{figure*}





\begin{figure*}[t]
\centering
\includegraphics[width=0.8\textwidth]{figures/new_measurement_error_demand_30runs-1.png}
% \begin{minipage}[t]{.28\textwidth}
% \centering
% \includegraphics[scale=0.38]{figures/demand_rho=0.25_MixtureGaussian.pdf}
% \caption*{(a) $\rho=0.25$}
% \end{minipage}
% \begin{minipage}[t]{.28\textwidth}
% \centering
% \includegraphics[scale=0.38]{figures/demand_rho=0.5_MixtureGaussian.pdf}
% \caption*{(b) $\rho=0.5$}
% \end{minipage}
% \begin{minipage}[t]{.28\textwidth}
% \centering
% \includegraphics[scale=0.38]{figures/demand_rho=0.9_MixtureGaussian.pdf}
% \caption*{(c) $\rho=0.9$}
% \end{minipage}
% \begin{minipage}[t]{.14\textwidth}
% \centering
% \includegraphics[scale=0.7]{figures/legend_tall.pdf}
% \end{minipage}
% \caption{Step 2 for training MerrorKIV - finding the $X1$ and $\lambda_{X}^{(s_1)}$}
\caption{Out of sample MSEs ($\log_{10}$) for demand design.}
\label{fig:demand_mog}
\end{figure*}

% what we are comparing with: KIV-oracle, KIV-M, KIV-MN
In this section we evaluate the empirical performance of MEKIV across multiple designs and against baselines. In particular, we compare to three baselines: A) KernelIV \cite{singh2019kernel} with ground truth X provided from an oracle (KIV-Oracle); B) KernelIV taking $M$ as the true treatment observations (KIV-M); C) since taking the average of independent errors reduces the error variance, we also compare with KernelIV taking $(M+N)/2$ as the true treatment observations (KIV-MN).

% what datasets do we consider. linear, sigmoid, demand
We run each estimator on three designs. The \textit{linear} design \cite{chen2017optimal} involves learning the structural function $f(x) = 4x - 2$, where $X$ is unseen and we are only given corrupted measurements of treatment $(M, N)$, continuous instrument $Z$, and observations of outcome variables $Y$ which is confounded with the true treatments $X$. The \textit{sigmoid} design \cite{chen2017optimal} involves learning the structural $f(x) = \ln(|16x - 8| + 1)\cdot sgn(x-0.5)$ under the same data generating process otherwise. The \textit{demand} design \cite{Hartford17:DIV} involves learning demand function $h(p,t,s) = 100 + (10+p) \cdot s \cdot \psi(t) - 2p$ where $\psi(t)$ is a complicated nonlinear function. A data tuple including the ground truth treatments consists of $(Y, P, T, S, C)$ where $Y$ is sales, $P$ is price, $T$ is time of year, $S$ is customer sentiment (a discrete variable), and $C$ parameterizes the supply cost shift. A parameter $\rho \in \{0.25, 0.5, 0.9\}$ calibrates the confounding level of $P$ by supply-side market forces. We set $X \defeq (P, T, S)$ and instruments are $Z \defeq(C, T, S)$. 

Since the originally proposed design is one where $X$ is observed, we construct $M$ and $N$ from $X$ and we mask $X$ from all algorithms except KIV-Oracle. For the demand design where $X$ is 3-dimensional, we mask only the dimension corresponding to $P$. For each design we construct $M$ and $N$ from adding noise on $X$. We analyze the robustness of MEKIV in two dimensions. First, we vary the measurement error distribution: we implement a \textit{Gaussian} additive noise design and a multi-modal \textit{Mixture of Gaussian} additive noise design where we mix two Gaussian distributions, centred at twice the standard deviation of $X$ away from 0 on either side. Second, for each measurement error distribution, we vary their standard deviation. For both designs, we set the standard deviation of the Gaussian(s) to be $\{0.5, 1, 2\}$ times the standard deviation of the ground truth $X$. 


% what we do to augment the datasets. 1. varying measurement error level 2. varying measurement error type



% for each algorithm, design, measurement error level and measurement error type we implement 10 simulations and calculate mse wrt true structural function h.
For the linear and sigmoid design, we implement 30 simulations for each algorithm, measurement error distribution (merror type) and measurement error standard deviation (merror level). For the demand dataset, due to time constraints, we implement 30 simulations for the Mixture of Gaussian measurement error distribution and 10 for the Gaussian distribution, for each algorithm and measurement error standard deviation. We calculate MSE with respect to the true structural function $f$. Figure~\ref{fig:linear_sigmoid},~\ref{fig:demand_mog} and Figure~\ref*{fig:demand_gaussian} (Supplementary Materials) plots the results in each design, measurement error distribution type, and measurement error level. We expect KIV-Oracle to be the best across all methods and its performance is viewed as an upper bound for the other algorithms. MEKIV beats all other baselines in the highest measurement error level setting and is robust to non-classical measurement error as demonstrated by its performance under Mixture of Gaussian error. 


% \begin{figure*}[t]
% \centering
% \begin{minipage}[t]{.29\textwidth}
% \centering
% \includegraphics[scale=0.2]{figures/demand_rho-0.25_multi_gaussian_s.png}
% \end{minipage}
% \begin{minipage}[t]{.29\textwidth}
% \centering
% \includegraphics[scale=0.2]{figures/demand_rho-0.5_multi_gaussian_s.png}
% \end{minipage}
% \begin{minipage}[t]{.29\textwidth}
% \centering
% \includegraphics[scale=0.2]{figures/demand_rho-0.9_multi_gaussian_s.png}
% \end{minipage}
% \begin{minipage}[t]{.1\textwidth}
% \centering
% \includegraphics[scale=0.35]{figures/legend.png}
% \end{minipage}
% % \caption{Step 2 for training MerrorKIV - finding the $X1$ and $\lambda_{X}^{(s_1)}$}
% \caption{Demand design: Mixture of Gaussian.}
% \label{fig:algorithm}
% \end{figure*}

% discuss results. our method is robust under increasing measurement error level and different measurement error type. 