\documentclass{uai2024} 
\usepackage{amsfonts} 
\usepackage{hyperref}
\usepackage{xcolor}
\newcommand{\jin}[1]{\textcolor{blue}{#1}}
\newcommand{\yuta}[1]{\textcolor{red}{#1}}

\begin{document}



Thank you for your constructive comments and suggestions. They are  helpful for us to improve our paper. We will carefully incorporate them in the revised paper. In the following, your comments are first stated and then followed by our responses.

>Comment: 
What are the meanings of other curves in Figure 2(a)?

Our response: 
The dot-dashed curves around the P-CAPCE and PTSLS curves are 95\% confidence interval curves. 

>Comment: 
In Theorem 4.5, it states that the RKHS CAPCE estimator converges pointwise to CAPCE when $\lambda_3=0$, can we set $\lambda_3$ to be zero in the optimization problem (22)?

Our response: No, when $\lambda_3=0$, the inverse of the matrix $\hat{O}\hat{O}^T+N_2\xi {K}_{(X,W)^{(1)}(X,W)^{(1)}}$ in Eq. (24) is numerically unstable.  
Regularization leads to bias, but we must consider the bias-variance trade-off in practice.
This is a common problem for kernel methods.

>Comment: 
There maybe some typo errors. 
For example, in the definition of $\mu(z)$ in Theorem 3.1, should we swap the positions of $z_0$ and $z$? In Equation (22), do we need to swap the positions of $G_1$ and $G_2$?

Our response: These are not typos; the  equations in the paper are correct. As an illustration, the following derivation shows how the positions of $z$ and $z_0$ are swapped from Eq. (3) when substituting $g(x)=\sum_{j=1}^J\beta_j\phi_j(x)$ into Eq. (3) without covariates $W$. We have
$$E[Y|Z=z_0]-E[Y|Z=z]=\int_{\Omega_X} \{p(X\leq x|Z=z)-p(X\leq x|Z=z_0)\}\sum_{j=1}^J\beta_j\phi_j(x) dx$$
$$\Leftrightarrow E[Y|Z=z_0]-E[Y|Z=z]=\sum_{j=1}^J\beta_j\int_{\Omega_X} \{p(X> x|Z=z_0)-p(X> x|Z=z)\}\phi_j(x) dx$$
$$\Leftrightarrow E[Y|Z=z_0]-E[Y|Z=z]
    =\sum_{j=1}^J\beta_j\{E[\varphi_j(X)|Z=z_0]-E[\varphi_j(X)|Z=z]\}$$
$$\Leftrightarrow E[Y|Z=z]-E[Y|Z=z_0]=\sum_{j=1}^J\beta_j\{E[\varphi_j(X)|Z=z]-E[\varphi_j(X)|Z=z_0]\}.$$
$G_1$ is the model to predict $Y$ based on $Z$, $G_2$ is the model to predict $\pi(X,W)$ based on $Z$.
Thus, Eq. (22) is consistent with (3).

>Comment: 
The true CAPCE depends on the values of $x$ and $w$, how do you get the overall MSE for all true CAPCE values?

Our response: MSE is computed as  $\frac{1}{N_1'}\sum_{i=1}^{N_1'}(\hat{g}(x_i^{(1)'},w_i^{(1)'})-g(x_i^{(1)'},w_i^{(1)'}))^2$ with test dataset ${\cal D}^{(1)'}$.


\end{document}


%Comment 1: The proposed RKHS CAPCE method is very time consuming.
%The estimated value of coefficient for $W$ is far from the true coefficient for the P-CAPCE method when the sample size is 1000.

%Our response: Yes, these are the weaknesses of our methods. They are common weaknesses of kernel methods or instrumental variables.


%Comment 2: In the real-data example, the sample size is 857, which is not too small, but the authors still say that ``We applied P-CAPCE and PTSLS. Other estimators are not used due to the small sample size." The proposed S-CAPCE and RKHS CAPCE methods have high requirement on the sample size, which may not be applicable in many practical problems.

%Our response: In our paper, we pick up a traditional example in economics and instrumental variable literature. Recently, the application of big data has become a trend, and the nonparametric estimation methods will be more important in the future.


%Comment 4: 
%There are some typo errors in this paper.
%Our response: We will fix them.

%Comment 5: 
%There are some points that need to be explained and clarified.
%Our response: 



%Thus, $z$ and $z_0$ are flipped in Eq. (3) and the construction of estimation methods. 