\documentclass{article}

\usepackage{neurips_2023}
\usepackage{amsfonts} 
\usepackage{xcolor}
\newcommand{\jin}[1]{\textcolor{blue}{#1}}
\newcommand{\yuta}[1]{\textcolor{red}{#1}}

\begin{document}

Comment:
Thank you for your responses. If the authors could clarify the exact edits so that I can review them that would be most appreciated. Specifically for the final 2 questions, it is unclear what exactly the authors will do.

{\bf Response to 8/16 Comment.}

Thank you for the feedback. We describe how we plan to revise the paper to address your comments below.

>1. Can we re-structure the paper to avoid the unnecessary notation in the main paper and add more details to help the interested reader understand the approach?

We plan to shorten Section 4.2 (Parametric CAPCE estimator), as suggested by Reviewer H3y3 since the parametric estimator in Section 4.2 is constructed in a similar way as the sieve estimator in Section 4.1, to just focus on key differences. The saved space will be used to add more explanations to help the readers better understand the paper. Specifically, to address your comments, we will make the following edits.

-To address your comments, "For this reason and the point above, the emphasis on the definition and novelty of CAPCE in the paper feels somewhat overstated.", and your question, "What's the benefit of the derivative approach over the dose curve approach? Wouldn't dose curves be more useful in practice?", we will add the following discussion after the definition of CAPCE at Line 90:

``The quantity represented by CAPCE has been implicitly studied in the literature (e.g. Galagate 2016) and it is not tied to the IV analysis. Still most existing works have focused on $\mathbb{E}[Y_x|w]$.  One contribution of this work is showing that under the IV model, CAPCE  is identifiable under a weaker separability assumption than required by $\mathbb{E}[Y_x|w]$. We present theoretical and empirical results to show the usefulness of formally defining and investigating CAPCE and the merits of estimating CAPCE on behalf of  $\mathbb{E}[Y_x|w]$. Granted, given an estimated  $\mathbb{E}[Y_x|w]$, one can compute its derivative to obtain CAPCE, but not the other way around. However, one main interest in practice is often the causal effect from a reference point, e.g., CACE. CAPCE is enough to compute causal effects from a reference point: $\mathbb{E}[Y_{x''}-Y_{x'}|w]=\int_{x'}^{x''}\mathbb{E}[\partial_x Y_{x}|w]dx$.''

-To address your comments regarding Assumption 3.2, we will add your suggested explanation right after Assumption 3.2 at Line 104:

``Assumption 3.2 states that there cannot be any interactions between the unmeasured confounders ($H$) and treatment ($X$) unless they are fully mediated by the observed covariates ($W$).''

-To address your concern about model selection criterion for the S-CAPCE and P-CAPCE, we will add the below sentence in Line 167:

``The model selection in Stage 1 is a standard regression problem, and we presume the models in Stage 1 have been selected appropriately according to standard machine learning methods."

and the following in Line 245:

``We presume the models in Stage 1 have been selected appropriately."

Please let us know if you think some other parts of the paper are not clear and need explanation. Your comments are appreciated and are helpful for us to improve the paper.

>2. Can we improve the application section and compare to alternative work on this topic?

First, we will add the following sentence in Line 299:

``The use of mother's education as an instrument in this dataset has been subjected to debate in the literature (e.g., Blackburn and Neumark, 1992; Card, 1999; Jeffrey, 2001; Wooldridge, 2010). We followed the first paper using this data (Blackburn and Neumark, 1992) to use mother's education as an instrument."

The following will replace Lines 315-317:

While we estimate the heterogeneity of causal effects of education on wages across subjects with different IQs, existing works (Blackburn and Neumark, 1992; Card, 1999; Jeffrey, 2001; Wooldridge, 2010) using this dataset have used PTSLS and focused on the effects of education on wages over the whole population. Card (1999) and Wooldridge (2010) provided a summary of the existing works on IV estimates and show the estimates of all studies are positive implying education increases wages.
On the other hand, our results give two new insights into the effects of education on wages.
First, our results suggest that for each sub-population $IQ=80, 100, 120$, education significantly affects wages at the compulsory school level; but has little effect at the college level.
This result is consistent with the result of APCE estimates for the whole population given in (Kawakami et al., 2023). 
Second, we reveal that the effect of education on wages is more significant for high IQ students,  especially at the compulsory school level.
To the best of our knowledge, this result was not revealed in previous studies of this dataset, but it is consistent with the panel data analysis result in [1].

(Kawakami et al., 2023) Yuta Kawakami, Manabu Kuroki, Jin Tian. Instrumental Variable Estimation of Average Partial Causal Effects.  Proceedings of the 40th International Conference on Machine Learning, PMLR 202:16097-16130, 2023.



%Then, the effect of education on wages is more significant for high IQ students, especially at the compulsory school level.


%that people with an academic degree earn higher incomes than people who don't have an academic degree, even if they possess the same skills, not the effect of education.


%average partial causal effect (APCE)  by P-APCE estimator (Kawakami et al., 2023) for the whole population, $192.491-10.267x$.
%The increase in wages by getting a higher education at the college level seems to be due to the phenomenon described in [41] and [14].

%Blackburn and Neumark (1992) conduct PTSLS, Jeffrey (2001) uses more efficient PTSLS based on the weighting method, Card (1999) and Wooldridge (2010) summarize the existing works. Table 5 in Card (1999) summarize the IV estimates.The estimates of all studies in Table 5 are positive, and imply education increase wages.

%
%in line 299, and give additional analyses using other IVs, ``father's education" or ``number of siblings," for improvement of the application section.

%We also compare alternative work on this topic (wages, schooling and IQ) (Bound et al., 1986) and add the sentence in line 315

%``Bound et al. (1986) assume causal effects are constant for all subjects; however, we estimate the heterogeneity of causal effects."

%[1] Bound, John, et al. “Wages, Schooling and IQ of Brothers and Sisters: Do the Family Factors Differ?” International Economic Review, vol. 27, no. 1, 1986, pp. 77–105.





%Our notations correspond to previous studies [35,40] to be easily read. We will add the below sentence to help the interested reader understand our approach.

%``There are two key differences from previous works [35,40]. First, we build prediction models not only on treatment and covariates but also on an outcome in stage 1.Second, we take the difference of the predication values of $z_i$ and $z_0$ in stage 2."









{\bf Rebuttal (8/12).}

Thank you for your constructive comments and suggestions. They are very helpful for us to improve our paper. We will carefully incorporate them in the revised paper. In the following, your comments and questions are first stated and then followed by our responses.

Comment:

1. Although a direct definition of the CAPCE does not appear in the literature, I think many causal inference specialists would argue that it is implicit in other published works; for instance, see Section 2.3 in "Causal inference with a continuous treatment and outcome: Alternative estimators for parametric dose-response functions with applications" by Douglas Galagate.

2. The CAPCE need not be tied to an instrumental variable analysis. It's a perfectly reasonable estimand in a setting with an unconfounded treatment (e.g., an experiment with varying continuous doses of a drug) but no instrument. For this reason and the point above, the emphasis on the definition and novelty of CAPCE in the paper feels somewhat overstated.

Our response:

We agree that the quantify represented by CAPCE has been implicitly studied and it is not tied to an instrumental variable analysis. Still most existing works have focused on $\mathbb{E}[Y_x|w]$.  One contribution of this work is showing that CAPCE $\mathbb{E}[\partial_x Y_x|w]$ is identifiable under a weaker separability assumption than $\mathbb{E}[Y_x|w]$. The theoretical and empirical results of the paper show the usefulness of formally defining and investigating CAPCE and the merits of estimating CAPCE on behalf of  $\mathbb{E}[Y_x|w]$.

Comment:

3. The methods developed in the paper lack uncertainty quantification, such as confidence intervals. It would be great if the authors could show how to provide uncertainty quantification, at least in limited scenarios. One such scenario is when the analyst is willing to assume a parametric model for all functions of interest. In this case, it should be possible to provide asymptotically valid uncertainty quantification using m-estimation methods. It may even be possible to leverage results from the debiased machine learning literature (see, for example, Chernozhukov et al (2018) on DML methods) to generate asymptotically valid uncertainty quantification for the CAPCE using machine learning methods for the nuisance functions via sample-splitting.

Our response:

Thank you for your advice. To our understanding, uncertainty quantification often relies on additive error terms, which is a strong functional restriction, as in (Chernozhukov et al., 2018).
The present paper deals with non-additive error terms, like the setting of the experiments. 
Uncertainty quantification  could be an interesting future research.

Comment:

4. While the applied data analysis is illustrative, the validity of the instrument is highly suspect. The exclusion restriction would preclude the existence of any variable that is correlated with a mother's education and her son's wage, conditional on the son's IQ and years of education.

Our response:

The use of mother's education as an instrument in this dataset has been subjected to debate in the literature (e.g., Blackburn and Neumark, 1992; Card, 1999; Jeffrey, 2001; Wooldridge, 2010). We followed the first paper using this data (Blackburn and Neumark, 1992) to use mother's education as an instrument. 
[1] Card, David. "The causal effect of education on earnings." Handbook of labor economics 3 (1999): 1801-1863.

[2] Kling, Jeffrey R. “Interpreting Instrumental Variables Estimates of the Returns to Schooling.” Journal of Business \& Economic Statistics, vol. 19, no. 3, 2001, pp. 358–64. JSTOR,

Comment:

5. Assumption 3.2 is essential to the method, but its practical meaning is somewhat cryptic. One way of explaining it might be to say that there cannot be any interactions between confounders ($H$) and treatment ($X$) unless they are fully mediated by the observed covariates ($W$).

Our response:

We agree with your interpretation. We here highlight that Assumption 3.2 is weaker than the standard separability Eq. (2) required by previous works (for identifying $\mathbb{E}[Y_x|w]$) as it allows the interactions between the confounders ($H$) and the observed covariates ($W$). 

Comment:

6. The model selection criterion for the S-CAPCE and P-CAPCE are somewhat unsatisfying because they capture only the second stage. 
Suppose, for example, that model parameters are selected such that both sides of (3) are zero. 
The model selection criterion would give this model a perfect score. 
Whatever criterion is employed ought to measure accuracy in both stages, similar to how the tuning parameters are selected for the RKHS CAPCE estimator.


Our response:

As you pointed out, the model selection criterion in the  paper for the S-CAPCE and P-CAPCE captures only the second stage.
We note that the model selection in the first stage is a standard regression problem, and we presume the models in the first stage have been selected appropriately according to standard machine learning methods.
We will clarify this point in the revised paper.

Question:

1. There is work by Kennedy et al. (https://arxiv.org/pdf/1507.00747.pdf) developing nonparametric methods for DR estimation of continous treatment effects. They target the dose curve $\theta(a)$. I think DR-learner type method could be easily constructed from their work that extends this to conditional dose curves. My question is whether the 2 are related. Specifically, how does one go from dose curves to these derivative focused approaches?

2. What's the benefit of the derivative approach over the dose curve approach? Wouldn't dose curves be more useful in practice?

Our response:

In terms of the notation used in this paper, the dose curve $\theta(a) = \theta(x) = \mathbb{E}[Y_x]$ and the conditional dose curve will be $\mathbb{E}[Y_x|w]$. The key difference between (Kennedy et al. 2016) and this work is that Kennedy et al (2016) identify $\mathbb{E}[Y_x]$ by adjustment over the covariates $W$ under the ignorability assumption (that is, no unmeasured confounders) while this work identifies CAPCE $\mathbb{E}[\partial_x Y_x|w]$ under a separability Assumption 3.2 in the IV setting allowing unobserved confounders. 

Given an estimated dose curve $\mathbb{E}[Y_x|w]$, one can compute its derivative to obtain CAPCE, but not the other way around. However, one main interest in practice is often the causal effect from a reference point, e.g. the conditional average causal effect (CACE) $\mathbb{E}[Y_1-Y_0|{w}]$, also known as conditional average treatment effect (CATE). The derivatives of the dose curve are enough to compute causal effects from a reference point: $\mathbb{E}[Y_{x''}-Y_{x'}|{ w}]=\int_{x'}^{x''}\mathbb{E}[\partial_x Y_{x}|{ w}]dx$. 

One contribution of this work is that we show, under the IV setting allowing unobserved confounders, CAPCE $\mathbb{E}[\partial_x Y_x|w]$ is identifiable under a weaker assumption than required for identifying  $\mathbb{E}[Y_x|w]$: Assumption 3.2 $f_Y(X,W,H,u_Y)
=f_{Y_1}(X,W,u_Y)+f_{Y_2}(W,H,u_Y)$
vs. the standard separability assumption Eq. (2)
$f_Y(X,W,H,u_Y)=f_{Y_1}(X,W,u_Y)+f_{Y_2}(H, u_Y)$. This shows a major practical benefit of the derivative approach proposed in this paper over the dose curve approach. 

Question:

1. Can we re-structure the paper to avoid the unnecessary notation in the main paper and add more details to help the interested reader understand the approach?

Our response:

Thanks for the feedback. We will do our best to simplify  notation and add more explanations.

Question:

2. Can we improve the application section and compare to alternative work on this topic?

Our response:

We will enhance the applications section by more detailed discussion of existing work on this topic.


\end{document}