
We present an application using a real-world dataset.

{\bf Dataset.}
We use an open dataset from 
%the UC Irvine Machine Learning Repository 
(\url{https://archive.ics.uci.edu/dataset/320/student+performance}) on student performance in mathematics from secondary education in two Portuguese schools.
Secondary education lasts for three years, and students are tested once per year, resulting in a total of three tests.
%This data approaches student achievement in secondary education of two Portuguese schools. 
The dataset includes attributes related to demographics, social factors, school-related features, and student grades. 
%and it was collected by using school reports and questionnaires.
The sample size is 649, with no missing values.
Prior research using this dataset aimed to predict students’ performance based on their attributes \citep{Cortez2008, Helwig2017}.
\citet{Kawakami2024} assess the causal relationship between the students' performance, study time, and extra paid classes via estimating PoC.
%introduced in this paper.
In this paper, we analyze the causal relationship between students’ performance in the final period, study time, and extra paid classes, considering their performance in the first and second periods as mediators.



{\bf Variables.}
We consider the mathematics scores in the first period (${M}$) and the second period (${N}$) as mediators, while the mathematics score in the final period ($Y$) serves as the outcome variable.
They take values from $0$ to $20$, respectively.
%$\{0, 1, \ldots, 20\}$, respectively. 
%We consider these variables as discretized versions of normally distributed variables.
We note that \citet{Kawakami2024} considered all mathematics scores from the first period, the second period, and the final period as the vector of outcome variables.
We consider “study time in a week” ($X^1$) and “extra paid classes within the course subject” ($X^2$) (where yes: $X^2=2$, no: $X^2=1$) as the treatment variables, denoted as $X= (X^1, X^2)$.
%We consider ``\emph{study time in a week}'' ($X^1$) and ``\emph{extra paid classes within the course subject}'' ($X^2$) (yes: $X^2=2$, no: $X^2=1$) as treatment variables $X= (X^1, X^2)$. 
We select “sex,” “failures,” “schoolsup,” “famsup,” and “goout” as the covariates ($C$), following their selection in \citep{Helwig2017} and note that they are also used in \citep{Kawakami2024}.
%We select ``sex'', ``failures'', ``schoolsup'', ``famsup'', and ``goout'' as the covariates ($C$), which were chosen in  \citep{Helwig2017} and used in \citep{Kawakami2024}.
We estimate the path-specific PNS using linear regression models, as described in Section 5.
We conduct 1,000 bootstrap resampling iterations \citep{Efron1979} to examine the distribution of the estimators.




In this dataset, Assumption \ref{AS1}’, for instance, $\mathbb{P}(Y_{x,{M}_{x'},{N}_{x,{M}_{x}}}\ \prec y \preceq Y_{x,{M}_{x'},{N}_{x,{M}_{x'}}}|C=c)=0$ is reasonable.
This is because the scores in the first period, had she studied four hours a week and taken extra classes, appear to be greater than those in the first period had she studied only one hour a week and taken no extra classes. 
Moreover, if her scores in the first had been higher, the scores in the second periods would also have been higher.
Furthermore, if her scores in the second had been higher, the scores in the third periods would also have been higher.
%higher scores in the first period lead to higher scores in the second period, which in turn result in higher scores in the final period.
Assumption \ref{ASM} is also reasonable, as, for example, if $U^{{M}}$ represents a genetic factor influencing mathematics performance, it can exert a strictly monotonic increasing effect on the scores in the first period.
%Assumption \ref{ASM} is also reasonable since, for example, considering $U^{{M}}$ is the genetic factor for math performance, $U^{{M}}$ can have a strictly monotonic increasing influence on the scores in the first period.






{\bf Results.}
We consider the subject whose ID number is 1 and set the values of her covariates as $c_1$.
We define the treatment values as $x'=(1,1)$, $x=(4,2)$, set the outcome threshold to $y=10$, and let the evidence be ${\cal E}=\emptyset$.
The estimates of $\text{\normalfont T-PNS}$,
$\text{\normalfont ND-PNS}^{{M}}$, and $\text{\normalfont NI-PNS}^{{M}}$ 
at $(y;x',x,{\cal E},c_1)$ are $15.259 \%$ $(\text{CI}: [0.000\%,33.022\%])$, $1.032$ $\% (\text{CI}: [0.000\%,7.452\%])$, and $14.226$ $\% (\text{CI}: [0.000\%,2.304\%])$, respectively.
%of $\text{\normalfont T-PNS}$, $\text{\normalfont PNS}^{X \rightarrow Y}$, $\text{\normalfont PNS}^{X \rightarrow {N} \rightarrow Y}$, $\text{\normalfont PNS}^{X \rightarrow {M} \rightarrow {N} \rightarrow Y}$, and $\text{\normalfont PNS}^{X \rightarrow {M} \rightarrow Y}$ given $C=c_1$ are 
%\begin{align}
%&\text{\normalfont T-PNS}: &15.259 \% (\text{CI}: [0.000\%,33.022\%]).\nonumber\\ 
%&\text{\normalfont ND-PNS}^{{M}}: &1.032 \% (\text{CI}: [0.000\%,7.452\%]),\nonumber\\
%&\text{\normalfont NI-PNS}^{{M}}: &14.226 \% (\text{CI}: [0.000\%,29.405\%]).\nonumber
%&\text{\normalfont PNS}^{X \rightarrow Y}: &0.149 \% (\text{CI}: [0.000\%,2.304\%]),\nonumber\\
%&\text{\normalfont PNS}^{X \rightarrow {N} \rightarrow Y}: &0.883 \% (\text{CI}: [0.000\%,6.239\%]),\nonumber\\
%&\text{\normalfont PNS}^{X \rightarrow {M} \rightarrow {N} \rightarrow Y}:  &0.000 \% (\text{CI}: [0.000\%,0.000\%]),\nonumber\\
%&\text{\normalfont PNS}^{X \rightarrow {M} \rightarrow Y}:  &14.226 \% (\text{CI}: [0.000\%,29.405\%]).\nonumber
%\end{align}
The results indicate that studying 4 hours a week and taking extra classes would be necessary and sufficient to achieve a score above 10 in the final period for 15.259 $\%$ of subjects.
Moreover, when ignoring the scores in the second period, studying 4 hours a week and taking extra classes would remain necessary and sufficient at the same level of T-PNS if the influence existed solely through the scores in the first period.
%or through both the scores in the first and second periods.
%However, the results do not account for the path through the score in the second period.





We ask four further causal questions about this dataset, considering the scores in the first and second periods:
\begin{center}
%\vspace{-0.2cm}
({\bf Q-a1}'). {\it Would studying 4 hours a week and taking extra classes still be necessary and sufficient to achieve a score above 10 in the final period if the influence through the scores in the first and second periods had not existed?
%, compared to studying 1 hour a week and taking no extra classes?
}\\%\vspace{0.2cm}
({\bf Q-a2}'). {\it Would studying 4 hours a week and taking extra classes still be necessary and sufficient to achieve a score above 10 in the final period if the influence occurred only through the scores in the second period, and not through the scores in the first period?
%, compared to studying 1 hour a week and taking no extra classes?
}\\%\vspace{0.2cm}
({\bf Q-b1}'). {\it Would studying 4 hours a week and taking extra classes still be necessary and sufficient to achieve a score above 10 in the final period if the influence occurred only through both the scores in the first and second periods?
%, compared to studying 1 hour a week and taking no extra classes?
}\\%\vspace{0.2cm}
({\bf Q-b2}'). {\it Would studying 4 hours a week and taking extra classes still be necessary and sufficient to achieve a score above 10 in the final period if the influence occurred only through the scores in the first period, and not through the scores in the second period?
%, compared to studying 1 hour a week and taking no extra classes?
}
%\vspace{-0.2cm}
\end{center}
Then, the estimates of $\text{\normalfont PNS}^{X \rightarrow Y}$, $\text{\normalfont PNS}^{X \rightarrow {N} \rightarrow Y}$, $\text{\normalfont PNS}^{X \rightarrow {M} \rightarrow {N} \rightarrow Y}$, and $\text{\normalfont PNS}^{X \rightarrow {M} \rightarrow Y}$ at $(y;x',x,{\cal E},c_1)$ are
%of $\text{\normalfont T-PNS}$, $\text{\normalfont PNS}^{X \rightarrow Y}$, $\text{\normalfont PNS}^{X \rightarrow {N} \rightarrow Y}$, $\text{\normalfont PNS}^{X \rightarrow {M} \rightarrow {N} \rightarrow Y}$, and $\text{\normalfont PNS}^{X \rightarrow {M} \rightarrow Y}$ given $C=c_1$ are 
%\vspace{-0.1cm}
\begin{align}
%&\text{\normalfont T-PNS}: &15.259 \% (\text{CI}: [0.000\%,33.022\%]),\nonumber\\ 
%&\text{\normalfont ND-PNS}^{{M}}: &1.032 \% (\text{CI}: [0.000\%,7.452\%]),\nonumber\\
%&\text{\normalfont NI-PNS}^{{M}}: &14.226 \% (\text{CI}: [0.000\%,29.405\%]),\nonumber\\
&\text{\normalfont PNS}^{X \rightarrow Y}:\, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \,  0.149 \% (\text{CI}: [0.000\%,2.304\%]),\nonumber\\
&\text{\normalfont PNS}^{X \rightarrow {N} \rightarrow Y}:\, \, \, \, \, \, \, \, \, \,  0.883 \% (\text{CI}: [0.000\%,6.239\%]),\nonumber\\
&\text{\normalfont PNS}^{X \rightarrow {M} \rightarrow {N} \rightarrow Y}:  0.000 \% (\text{CI}: [0.000\%,0.000\%]),\nonumber\\
&\text{\normalfont PNS}^{X \rightarrow {M} \rightarrow Y}:\, \, \, \,   14.226 \% (\text{CI}: [0.000\%,29.405\%]).\nonumber
\end{align}
%The results suggest that the answers for (Q-a1'), (Q-a2'), and (Q-b1') could be "no", and the answer for (Q-b2') could be "yes".
The results suggest that the necessity and sufficiency of the treatment are almost entirely attributed to the indirect influence occurring solely through the first mediator, while accounting for the path through the score in the second period $N$.
%Furthermore, studying 4 hours a week and taking extra classes would remain necessary and sufficient at the same level of T-PNS if the influence existed solely through the scores in the first period.





Additionally, we provide the estimates under the evidence condition ${\cal E} = (X=0, 10 \leq Y < 15)$, while maintaining the same settings as in Appendix \ref{appE}. 
%Additionally, we present the estimates, setting the evidence as ${\cal E} = (X=0, 10 \leq Y < 15)$, while keeping the other settings the same as in Appendix \ref{appE}.
Similar results are observed for the subpopulations defined by the specified evidence.
