\documentclass{article}

\usepackage{neurips_2023}
\usepackage{amsfonts} 
\usepackage{hyperref}
\usepackage{xcolor}
\newcommand{\jin}[1]{\textcolor{blue}{#1}}
\newcommand{\yuta}[1]{\textcolor{red}{#1}}

\begin{document}



Thank you for your constructive comments and suggestions. They are very helpful for us to improve our paper. We will carefully incorporate them in the revised paper. In the following, your comments are first stated and then followed by our responses.

Comment 1:

< My main concern is the correctness of the kernel estimator definition.
In line 229 and (21), $<G_1,\psi(z)>_{H_z}$ is a scalar while $\mathbb{E}[\pi(X,W)|Z=z]$ is a function of $X$ and $W$. 
So it does not make sense to write that the former equals the latter. 
I worry that the RKHSs have not been defined properly.

In (25), I would expect to see the antiderivative kernel, similar to how we saw the antiderivative basis in the earlier estimators.

This is my main reason for recommending rejection. 
I suspect that fixing this issue may require substantial work-more than is typical in a conference rebuttal.
If this was fully fixed, as well as the items below (which are more typical for a conference rebuttal), I would recommend acceptance in the future.

Our response:

We have defined RKHSs properly.
First, the inner product $<G_1,\psi(z)>_{H_Z}$ in line 229 and (21) is not a scalar; this is a function of $z$.

Briefly, $G_1$ takes the linear combination form of $\sum_{j=1}^{N_1}\alpha_j \psi(z_j^{(1)})$  from representer theorem.
Then, $<G_1,\psi(z)>_{H_Z}$ can be represented by 

$$
\sum_{j=1}^{N_1}\alpha_j <\psi(z_j^{(1)}),\psi(z)>_{H_Z},
$$

this coincides with 
$$
\sum_{j=1}^{N_1}\alpha_j k_Z(z_j^{(1)},z)
$$

from the kernel trick, and this is a function of $z$ like in the standard kernel ridge regression. 

Second, $\mathbb{E}[\pi(X,W)|Z=z]$ means $\mathbb{E}_{X,W}[\pi(X,W)|Z=z] = \int_{\Omega_{W}}\int_{\Omega_X} \pi(x,w) p(x,w|Z=z) dx dw$; this is a function of $z$, not a function of $X$ and $W$.

The notation $\mathbb{E}[\cdot|Z=z]$ stands for conditional expectation (defined in Lines 40-46). Therefore, $\mathbb{E}[\pi(X,W)|Z=z]$ with capital $X, W$ means expectation over $X, W$:  $\mathbb{E}_{X,W}[\pi(X,W)|Z=z] = \int_{\Omega_{W}}\int_{\Omega_X} \pi(x,w) p(x,w|Z=z) dx dw$; this is a function of $z$, not a function of $X$ and $W$.

Thus, in Line 229, both sides are functions of $z$, and RKHSs are defined properly. 
The expression $\mathbb{E}[\pi(X,W)|Z=z]=<G_1,\psi(z)>_{H_z}$ is also used in the standard kernel ridge regression problems, and our notations are following Kernel IV (Singh et al., 2019). See details in
https://en.wikipedia.org/wiki/Representer_theorem.

In (25), $k_{X,W}$ is indeed an antiderivative kernel function and the feature map  $\pi(x,w)$  is an antiderivative function. The details are in Appendix A.2, where 
 $\pi(x,w)$ is represented as an antiderivative function in line 48. We are sorry for the confusion. We will revise the sentence in line 226 in the paper

``
Denote the feature map 
$\pi: \Omega_{X,W}  \rightarrow H_{X,W}, (x,w) \mapsto k_{X,W}(x,w,\cdot,\cdot)$ and $\psi: \Omega_Z  \rightarrow H_Z, z \mapsto k_Z(z,\cdot).$" 

to the following:

``Denote the feature map
$\eta: \Omega_{X,W}  \rightarrow H_{X,W}, (x,w) \mapsto k'_{X,W}(x,w,\cdot,\cdot)$ and $\psi: \Omega_Z  \rightarrow H_Z, z \mapsto k_Z(z,\cdot)$.

In addition, we denote the antiderivative feature function 
$\pi(x,{ w})=-\int \eta(x,w) dx$ and the antiderivative kernel function $k_{X,W}(x,w,x',w')= \int \int k'_{X,W}(x,w,x',w')dxdx'$."  

Note that we calculate the antiderivative kernel function easily and explicitly just by taking the antiderivative of the kernel function based on Fubini's theorem:

$$
<\pi(x,w),\pi(x',w')>=\int\int<\eta(x,w),\eta(x',w')>dxdx'.
$$

We have provided a detailed derivation of the RKHS CAPCE estimator in Appendix A.2. 
We hope this detailed derivation could address your concern about the RKHS estimator.

Comment 2:

< Some closely related CAPCE estimators exist for similar settings, which affects the framing.
Under the stronger separability assumption for NPIV models, I believe that (rather strong) sieve estimation results for CAPCE are contained in “Optimal sup-norm rates and uniform inference on nonlinear functionals of nonparametric IV regression.”
The key difference appears to be that this paper considers a weaker separability assumption. That suggests a different framing.

For the simpler setting without unobserved confounding, I believe that kernel estimation results for CAPCE are contained in “Kernel methods for causal functions: dose, heterogeneous, and incremental response curves.”
The key difference appears to be that this paper considers the additional challenge of unobserved confounding. Again, that suggests a different framing.

Our response:

Thank you for letting us know about these two papers (Chen and Christensen, 2015) and (Singh et al., 2020). 
We will acknowledge and cite them. 
One key difference with the two papers is that this work estimates the derivatives CAPCE $\mathbb{E}[\partial_x Y_x|w]$ *directly* instead of indirectly via estimating the structural function $\mathbb{E}[Y_x|w]$. As a consequence, the estimation methods in the two papers do not  generalize to or contain the estimation methods in this paper; and our methods do not reduce to the methods in the two papers either.

Specifically, Chen and Christensen (2015) estimates the derivatives of the structural function $h_0$ via estimating structural function $h_0$ using instrumental variables. This work estimates the derivatives of the structural function (CAPCE) directly  based on solving a different integral equation from that in (Chen and Christensen, 2015). Our methods are not contained in, and do not reduce to the methods in (Chen and Christensen, 2015). A weaker separability assumption is achieved by avoiding estimating structural function.

The settings of (Singh et al., 2020) are quite different from this paper. Estimating causal functions without unobserved confounding is very different from estimating CAPCE under the IV setting with unobserved confounding. Our methods are quite different from (Singh et al., 2020). Our methods are not contained in and do not reduce to the methods in (Singh et al., 2020). 

Comment 3:

< The notation in the paper is heavy, but the final estimator is actually straightforward.

The procedure is: regress $Y$ on $Z$, regress the anti-derivative basis on $Z$, then regress those results on each other to get the coefficient. Evaluate this coefficient with the original basis. In the end, the proposed estimators are all ridge solutions calculated using the anti-derivative basis and then evaluated at the original basis.

Our response:

We agree with your summary procedure. In the end, the proposed estimation methods follow a somewhat standard process. Still, the proposed identification method of CAPCE is novel and the derivation and analysis  of the corresponding estimators are nontrivial.

Questions:

< Can the concerns about the kernel estimator be addressed? For me, this is the difference between recommending acceptance versus rejection.

Our response:

We hope our above responses successfully address your concern about the kernel estimator.

\end{document}