
%We show an application to a real-world dataset in education.


{\bf Dataset.}
We take up an open dataset in the UC Irvine Machine Learning Repository \url{https://archive.ics.uci.edu/dataset/320/student+performance} 
about student performance in mathematics in secondary education of two Portuguese schools.
Secondary education lasts three years, and students are tested once a year, three times in total.
%This data approaches student achievement in secondary education of two Portuguese schools. 
The data attributes include demographic, social, and school-related features and student grades. %and it was collected by using school reports and questionnaires.
The sample size is $649$ with no missing values. 
Prior research using this data  aimed  to predict the students' performance based on their attributes \citep{Cortez2008,Helwig2017}.
We assess the causal relationship between the students' performance, study time, and extra paid classes via estimating PoC introduced in this paper.

{\bf Variables.}
We take the scores of mathematics in the final period ($Y^1$), in the second period ($Y^2$), and in the first period ($Y^3$) as the outcome variables ${\boldsymbol Y}=(Y^1,Y^2,Y^3)$. $Y^1, Y^2, Y^3$ take values from $\{0, 1, \ldots, 20\}$.  We assume a lexicographical order $\succ_{\text{lexi}}$ on $\boldsymbol{Y}$. For example, $(Y^1,Y^2,Y^3) \succ_{\text{lexi}} (6, 6, 6)$ means ``$Y^1>6$'' or ``$Y^1=6\land Y^2>6$'' or ``$Y^1=6\land Y^2=6\land Y^3>6$''.
\begin{comment}
We pick up a tuple of the three outcome variables, ${\boldsymbol Y}=(Y^1,Y^2,Y^3)$,
\begin{itemize}
    \vspace{-0.1cm}
      \setlength{\parskip}{0.cm}
  \setlength{\itemsep}{0.15cm}
    \item[] $Y^1$: \yuta{Scores of mathematics in the final period}, \jin{What is 'final' grade?}
    \item[] $Y^2$: Scores of mathematics in the second period,
    \item[] $Y^3$: Scores of mathematics in the first period
\end{itemize}
\vspace{-0.1cm}
and they take the score from $0$ to $20$, respectively.
We introduce lexicographical order to outcomes.
Say ${\boldsymbol y}=(6,6,6)$;
the statement ``${\boldsymbol Y}\succ_{\text{lexi}}{\boldsymbol y}$'' means 
%\begin{center}
\yuta{``{\it one gets scores over 6 in the final period}'' or ``{\it one gets 6 scores in the final period and over 6 in the second period}'' or ``{\it one gets 6 scores in the final and second periods and over 6 in the first period }.''}
%    \yuta{``$Y^1>6$'' or ``$Y^1=6\land Y^2>6$'' or ``$Y^1=6\land Y^2=6\land Y^3>6$.''}
%\end{center}
\end{comment}
We consider ``\emph{study time in a week}'' ($X^1$) and ``\emph{extra paid classes within the course subject}'' ($X^2$) (yes: $X^2=2$, no: $X^2=1$) as treatment variables ${\boldsymbol X} = (X^1, X^2)$. 
%\st{Let the subject's all other attributes be the covariates ($\boldsymbol{C}$).} \jin{But in the experiments, you only selected 3 of them as covariates.} 
We select ``sex'', ``failures'', ``schoolsup'', ``famsup'', and ``goout'' as the  covariates ($\boldsymbol{C}$), which were chosen in  \citep{Helwig2017} in a previous study. % which are chosen with $p<0.05$ estimated coefficients by Table 3 in \citep{Helwig2017}, which is the previous study using this dataset.}


%\jin{We select ''school'', ``sex'', and ``age'' as the  covariates ($\boldsymbol{C}$).} \jin{Why only select these 3? Are there other attributes that have impact on both X and Y? }
%We assume Assumption \ref{RP2} which means that the counterfactual rankings of scores ${\boldsymbol Y}_{\boldsymbol x}$ among students are preserved for different ${\boldsymbol x}$ (study time and extra paid classes) given any covariates ${\boldsymbol C}={\boldsymbol c}$. 
We assume Assumption \ref{AS2} %since it is reasonable to assume 
which means that latent exogenous variables, such as the student's mental and physical conditions during the test day, have monotonic impacts on the test scores.


%\yuta{since it is reasonable to consider the counterfactual ranking of all student's scores ${\boldsymbol Y}_{\boldsymbol x}$ are preserved for any ${\boldsymbol x}$ (study time and extra paid classes) given the students covariates ${\boldsymbol C}={\boldsymbol c}$.} \jin{Justification/intuition for this assumption?}




{\bf Estimation Methods.}
%We do not discuss the estimation problems of PoC in this paper. However, each type of PoC is easily estimable through the conditional CDF, i.e., $\hat{\rho}({\boldsymbol y};{\boldsymbol x},{\boldsymbol c})$ and $\hat{\rho}^o({\boldsymbol y};{\boldsymbol x},{\boldsymbol c})$, by standard regression methods using sampled i.i.d dataset because all theorems in this paper consist of conditional CDF.
All identification theorems in the paper compute PoC through conditional CDFs, e.g. $\rho({\boldsymbol y};{\boldsymbol x},{\boldsymbol c})=\mathbb{P}({\boldsymbol Y}\prec {\boldsymbol y}|{\boldsymbol X}={\boldsymbol x},{\boldsymbol C}={\boldsymbol c})$. %$\hat{\rho}({\boldsymbol y};{\boldsymbol x},{\boldsymbol c})$. 
We estimate the conditional CDFs by logistic regression {using the ``glm'' function in R.} 
%$\mathbb{I}({\boldsymbol Y}\preceq {\boldsymbol y}) \sim {\boldsymbol X}+{\boldsymbol C}$ given values of ${\boldsymbol y}$,
%for ${\boldsymbol y}=(5,5,5)$ and $(6,6,6)$
%using R-package ``glmnet'' (\url{https://cran.r-project.org/web/packages/glmnet/index.html}). 
We conduct the bootstrapping  \citep{Efron1979} to reveal the distribution of the estimator. % of each type of PoC.
%\jin{In the appendix, you showed results by both logistic regression and logistic ridge regression. They gave very different results. Which method to use? I'm not sure it's a good idea to show results by both methods.}



%\yuta{[Comment: I have reanalyzed using glm package.]}


{\bf Results.}
We consider %the following four counterfactual statements related to 
the subject whose ID number is 1.
Let the values of her covariates be ${\boldsymbol c}_1$.
In reality, she studied for $2$ hours a week and took no extra paid classes (${\boldsymbol x}'=(2,1)$), and got $6$, $6$, and $5$ scores in the final, second, and first grades, respectively (${\boldsymbol y}'=(6,6,5)$).
The other attributes of her are shown in Appendix \ref{appB}.
%In this section, we pick up school, sex, and age as the subjects' covariates, and Appendix \ref{appB} provides the estimates by logistic regression and logistic ridge regression, including all variables. \jin{What do you mean by "including all variables"?}

In the first study, we evaluate conditional PNS, PN, and PS by setting ${\boldsymbol y}=(6,6,6)$, ${\boldsymbol x}_0=(2,1)$, ${\boldsymbol x}_1=(4,2)$, and ${\boldsymbol C}={\boldsymbol c}_1$ in Def. \ref{def41} to reveal the necessity/sufficiency of setting ${\boldsymbol x}_1$ w.r.t.
${\boldsymbol x}_0$ to produce ${\boldsymbol Y}\succeq_{\text{lexi}}{\boldsymbol y}$ in the  sub-population characterized by ${\boldsymbol C}={\boldsymbol c}_1$. 
%\yuta{Using PNS, PN, and PS, we try revealing the causal relationship between setting ${\boldsymbol x}_0$ compared to ${\boldsymbol x}_1$ and provoking  ${\boldsymbol Y}\succeq_{\text{lexi}}{\boldsymbol y}$ for a sub-population ${\boldsymbol C}={\boldsymbol c}_1$.}
The estimated values of conditional PNS, PN, and PS are 
%\jin{These numbers are different from those shown in the appendix???}
\begin{equation}
    \begin{aligned}
       &\text{PNS:} &8.862 \% &(\text{CI}: [1.122\%,19.510\%]),\\
       &\text{PN:} &9.212 \% &(\text{CI}: [1.133\%,20.647\%]),\\
      &\text{PS:}  &72.331 \% &(\text{CI}: [27.975\%,93.022\%]),
    \end{aligned}
\end{equation}
where CI represents 95$\%$ confidence intervals. 
The PNS value above represents the probability of the following statement:
\vspace{-0.25cm}
\begin{center}
    ``{\it {A student with attributes value ${\boldsymbol c}_1$}  
    would get scores ${\boldsymbol Y}\succeq_{\text{lexi}}{\boldsymbol y}$ had she studied 4 hours a week and taken extra classes and would get scores ${\boldsymbol Y}\prec_{\text{lexi}}{\boldsymbol y}$ had she studied 2 hours a week and taken no extra class.}''
\end{center}
\vspace{-0.25cm}
PN means the probability of the following statement:
\vspace{-0.25cm}
\begin{center}
    ``{\it {A student with attributes value ${\boldsymbol c}_1$} would get scores ${\boldsymbol Y}\prec_{\text{lexi}}{\boldsymbol y}$ had she studied 2 hours a week and taken no extra class when, in reality, she scored ${\boldsymbol Y}\succeq_{\text{lexi}}{\boldsymbol y}$, studied 4 hours a week, and took extra classes.}''
\end{center}
\vspace{-0.25cm} 
%For instance, 
And PS means the probability of the following statement:
\vspace{-0.25cm}
\begin{center}
    ``{\it {A student with attributes value ${\boldsymbol c}_1$} would get scores ${\boldsymbol Y}\succeq_{\text{lexi}}{\boldsymbol y}$ had she studied 4 hours a week and taken extra classes when,  in reality, she scored ${\boldsymbol Y}\prec_{\text{lexi}}{\boldsymbol y}$, studied 2 hours a week, and took no extra class.}''
\end{center}
\vspace{-0.25cm}
The results reveal that PNS and PN are relatively low, and PS is relatively high. In other words, 
studying 4 hours and taking extra classes for students with attributes value ${\boldsymbol c}_1$ are unlikely ``necessary and sufficient" or ``necessary”  to achieve ${\boldsymbol Y}\succeq_{\text{lexi}}{\boldsymbol y}$ compared to studying 2 hours and taking no extra class; however, they are highly ``sufficient".

In the second study, we consider more detailed evidence  than the first study and evaluate conditional PNS with evidence $({\boldsymbol y}',{\boldsymbol x}',{\boldsymbol c})$, letting ${\boldsymbol y}=(6,6,6)$, ${\boldsymbol y}'=(6,6,5)$, ${\boldsymbol x}_0=(2,1)$,
${\boldsymbol x}_1=(4,2)$, 
${\boldsymbol x}'=(2,1)$, and ${\boldsymbol C}={\boldsymbol c}_1$ in Def. \ref{EV1}. 
%\yuta{We consider more detailed evidence $({\boldsymbol y}',{\boldsymbol x}',{\boldsymbol c}_1)$ than the first analysis.}
The estimated value  is 
\begin{equation}
     \text{PNS:}\ \  0.024 \%\ \ \ \  (\text{CI}: [0.000\%,0.243\%]),
\end{equation}
which means 
the probability of the following statement:
\vspace{-0.25cm}
\begin{center}
    ``\it A student with attributes value ${\boldsymbol c}_1$ would get scores ${\boldsymbol Y}\succeq_{\text{lexi}}{\boldsymbol y}$ had she studied 4 hours a week and taken extra classes and
    would get scores ${\boldsymbol Y}\prec_{\text{lexi}}{\boldsymbol y}$ had she studied 2 hours a week and taken no extra class 
    when she scored ${\boldsymbol Y}={\boldsymbol y}'$, studied 2 hours a week, and took no extra class in reality.''
\end{center}
\vspace{-0.25cm}
We reveal that this probability is very low, that is, 
studying 4 hours and taking extra classes for students with $({\boldsymbol y}',{\boldsymbol x}',{\boldsymbol c}_1)$ are probably not ``necessary and sufficient" to achieve ${\boldsymbol Y}\succeq_{\text{lexi}}{\boldsymbol y}$ compared to studying 2 hours and taking no extra class.

In the third study, we evaluate conditional PNS with multi-hypothetical terms, letting ${\boldsymbol y}_1=(5,5,5)$, ${\boldsymbol y}_2=(6,6,6)$, ${\boldsymbol x}_0=(1,1)$, ${\boldsymbol x}_1=(2,1)$, ${\boldsymbol x}_2=(4,2)$, and ${\boldsymbol C}={\boldsymbol c}_1$ in Def. \ref{EV2}.
%\yuta{
%We next focus on achieving scores ${\boldsymbol y}_1 \preceq_{\text{lexi}}{\boldsymbol Y}\prec_{\text{lexi}}{\boldsymbol y}_2$.
%Using PNS with multi-hypothetical terms, we try revealing the causal relationship between setting ${\boldsymbol x}_1$ compared to ${\boldsymbol x}_0$ and ${\boldsymbol x}_2$ and provoking ${\boldsymbol y}_1 \preceq_{\text{lexi}}{\boldsymbol Y}\prec_{\text{lexi}}{\boldsymbol y}_2$ for a sub-population ${\boldsymbol C}={\boldsymbol c}_1$.}
The estimated value  is 
\begin{equation}
     \text{PNS:}\ \  0.000 \%\ \ \ \  (\text{CI}: [0.000\%,0.000\%]),
\end{equation}
which means the joint probability of the  following three  counterfactual statements:
\vspace{-0.25cm}
\begin{center}
    ``{\it (i) {A student with attributes value ${\boldsymbol c}_1$} would get scores ${\boldsymbol Y}\succeq_{\text{lexi}}{\boldsymbol y}_2$ had she studied 4 hours a week and taken extra classes,\\
    (ii) she would get scores ${\boldsymbol y}_1 \preceq_{\text{lexi}}{\boldsymbol Y}\prec_{\text{lexi}}{\boldsymbol y}_2$ had she studied 2 hours a week and taken no extra classes, and\\
    (iii) she would get scores ${\boldsymbol Y}\prec_{\text{lexi}}{\boldsymbol y}_1$ had she studied $1$ hour a week and taken no extra classes.}''
\end{center}
\vspace{-0.25cm}
We reveal that this probability is close to zero, that is, 
studying 2 hours and taking no extra class  for students with attributes value ${\boldsymbol c}_1$ are not ``necessary and sufficient" to achieve ${\boldsymbol y}_1 \preceq_{\text{lexi}}{\boldsymbol Y}\prec_{\text{lexi}}{\boldsymbol y}_2$ compared to ``studying 1 hour and taking no extra class" or ``studying 4 hours and taking extra classes".

%\yuta{[Comment: We have the folloing necessary and sufficient relationship:\\
%${\boldsymbol X}={\boldsymbol x}_1 \Rightarrow {\boldsymbol y}_1 \preceq_{\text{lexi}}{\boldsymbol Y}\prec_{\text{lexi}}{\boldsymbol y}_2$,\\
%${\boldsymbol X}={\boldsymbol x}_0 \lor {\boldsymbol X}={\boldsymbol x}_2 \Rightarrow \lnot ({\boldsymbol y}_1 \preceq_{\text{lexi}}{\boldsymbol Y}\prec_{\text{lexi}}{\boldsymbol y}_2)$,\\
%where $\lnot ({\boldsymbol X}={\boldsymbol x}_1)=({\boldsymbol X}={\boldsymbol x}_0) \lor ({\boldsymbol X}={\boldsymbol x}_2)$.
%]}


Finally, we consider more detailed evidence than the third study and  evaluate conditional PNS with multi-hypothetical terms and evidence $({\boldsymbol y}',{\boldsymbol x}',{\boldsymbol c})$, letting ${\boldsymbol y}_1=(5,5,5)$, ${\boldsymbol y}_2=(6,6,6)$, ${\boldsymbol y}'=(6,6,5)$, ${\boldsymbol x}_0=(1,1)$, ${\boldsymbol x}_1=(2,1)$,
${\boldsymbol x}_2=(4,2)$, 
${\boldsymbol x}'=(2,1)$, and ${\boldsymbol C}={\boldsymbol c}_1$ in Def. \ref{EV3}. 
%\yuta{We consider more detailed evidence $({\boldsymbol y}',{\boldsymbol x}',{\boldsymbol c}_1)$ than the third analysis.}
The estimated value is 
\begin{equation}
     \text{PNS:}\ \  96.711 \%\ \ \ \  (\text{CI}: [59.059\%,100.000\%]),
\end{equation}
which represents the probability of the above three counterfactual statements in the third study given additional information ${\boldsymbol x}'$ and ${\boldsymbol y}'$.
Unlike PNS with multi-hypothetical terms in the third study, PNS with multi-hypothetical terms and evidence $({\boldsymbol y}',{\boldsymbol x}',{\boldsymbol c}_1)$ is relatively high. That is, 
studying 2 hours and taking no extra class  with $({\boldsymbol y}',{\boldsymbol x}',{\boldsymbol c}_1)$ are highly ``necessary and sufficient" to achieve ${\boldsymbol y}_1 \preceq_{\text{lexi}}{\boldsymbol Y}\prec_{\text{lexi}}{\boldsymbol y}_2$ compared to ``studying 1 hour and taking no extra class" and ``studying 4 hours and taking extra classes". 
%\yuta{Given $({\boldsymbol y}',{\boldsymbol x}',{\boldsymbol c})$, studying 2 hours and taking no extra class can be ``necessary and sufficient" to achieve ${\boldsymbol y}_1 \preceq_{\text{lexi}}{\boldsymbol Y}\prec_{\text{lexi}}{\boldsymbol y}_2$ compared to ``studying 1 hours and taking no extra class" and ``studying 4 hours and taking extra classes".}

%The estimated values are all $0$ or $1$. \jin{What values are 0 or 1?}\yuta{[Old Results.]}

%\jin{Other potential meaningful experiments to perform? Maybe PoC of each $X^1$ and $X^2$ individually to compare with joint effects?}

We have performed additional analyses. To evaluate the effect of study time ($X^1$) only, we let ${\boldsymbol x}_1=(4,1)$ in the first and second analyses, and ${\boldsymbol x}_2=(4,1)$ in the third and fourth analyses.
The results are shown in Appendix \ref{appB}, and all estimated PoC are lower than that obtained with joint effect of study time and extra paid classes.
To evaluate the effect of extra paid classes ($X^2)$ only, we let ${\boldsymbol x}_1=(2,2)$ in the first and second analyses.
The results are shown in Appendix \ref{appB}, and all estimated PoC  are also lower than the results with joint effect. 

