We provide additional numerical experiments.


{\bf Simulation for the central moments of causal effects.}
%We assume the following SCM:
%\begin{equation}
%Y:=-(X+1)U, X \sim \text{Bern}(0.5), U\sim \text{Unif}(0,1)
%\end{equation}
%$\text{Bern}(0.5)$ is a Bernoulli distribution with probability $0.5$, and $\text{Unif}(0,1)$ is a uniform distribution of $[0,1]$.
%The central moments of the causal effects $\mathbb{E}[\{(Y_1-Y_0)-(\mathbb{E}[Y_1]-\mathbb{E}[Y_0])\}^m]$ are equal to $\mathbb{E}[(-U+\mathbb{E}[U])^m]$ for $m=1,\dots$.
%We simulate 1000 times with the sample size $N=20,100,10000$, respectively.
We let $N_1$, $N_2$, $N_3$, and $N_4$ be all 10.
The other settings are the same as in the body of the paper.



The ground truth of the variance of $Y_1-Y_0$ is $0.084$ and the estimated variances are
\begin{center}
\textbf{$N=20$}: $0.113$ (95\%CI: $[0.000,0.640]$),\\\vspace{0.1cm}
\textbf{$N=100$}: $0.093$ (95\%CI: $[0.000,0.4168]$),\\\vspace{0.1cm}
%\textbf{$N=1000$}: $ 0.08520678$ (95\%CI: $[0.03706546,0.1485271]$),
\textbf{$N=10000$}: $0.084$ (95\%CI: $[0.016,0.427]$).
\end{center}



The ground truth of the skewness of $Y_1-Y_0$ is $0$ and the estimated skewness are
\begin{center}
\textbf{$N=20$}: $0.005$ (95\%CI: $[-1.110,1.504]$),\\\vspace{0.1cm}
\textbf{$N=100$}: $0.007$ (95\%CI: $[-0.817,1.326]$),\\\vspace{0.1cm}
%\textbf{$N=1000$}: $-0.02439405$ (95\%CI: $[-0.3006143,0.2382194]$),
\textbf{$N=10000$}: $0.001$ (95\%CI: $[-0.590,1.120]$).
\end{center}



The ground truth of the kurtosis of $Y_1-Y_0$ is $1.798$ and the estimated kurtosis are
\begin{center}
\textbf{$N=20$}: $0.213$ (95\%CI: $[0,2.97]$),\\\vspace{0.1cm}
\textbf{$N=100$}: $0.174$ (95\%CI: $[0,2.666]$),\\\vspace{0.1cm}
%\textbf{$N=1000$}: $0.1347321$ (95\%CI: $[0,0.6077389]$),\\\vspace{0.1cm}
\textbf{$N=10000$}: $0.177$ (95\%CI: $[0,2.499]$).
\end{center}
All means of the estimators are close to the ground truth. 
However, estimators for small sample sizes have large 95 $\%$ CIs, and they show slow convergence to the ground truth at the point of view of the 95 \% CIs, especially for high-order moments.


{\bf Simulation for the central product moments of causal effects.}
%We assume the following SCM:
%\begin{equation}
%Y:=X^2U, X \sim \text{Bern}(0.5), U\sim \text{Unif}(0,1)
%\end{equation}
%The covariance of the causal effect $\mathbb{E}[\{(Y_1-Y_0)-(\mathbb{E}[Y_1]-\mathbb{E}[Y_0])\}\{(Y_0-Y_{-1})-(\mathbb{E}[Y_0]-\mathbb{E}[Y_{-1}])\}]$ are equal to $\mathbb{E}[-(U-\mathbb{E}[U])^2]$.
%The variances of the causal effects $\mathbb{E}[\{(Y_1-Y_0)-(\mathbb{E}[Y_1]-\mathbb{E}[Y_0])\}^2]$ and $\mathbb{E}[\{(Y_0-Y_{-1})-(\mathbb{E}[Y_0]-\mathbb{E}[Y_{-1}])\}^2]$ are equal to $\mathbb{E}[(-U+\mathbb{E}[U])^2]$.
%We simulate 1000 times with the sample size $N=30,100,10000$, respectively.
We let $N_1$ and $N_2$ be all 10.
The other settings are the same as in the body of the paper.


The ground truth of the covariance of $Y_1-Y_0$ and $Y_0-Y_{-1}$ is $-0.084$ and the estimated covariance are
\begin{center}
\textbf{$N=30$}: $-0.068$ (95\%CI: $[-0.173,0]$),\\\vspace{0.1cm}
\textbf{$N=100$}: $-0.079$ (95\%CI: $[-0.188,-0.008]$),\\\vspace{0.1cm}
\textbf{$N=10000$}: $-0.083$ (95\%CI: $[-0.179,-0.015]$).
\end{center}



The ground truth of the correlation of $Y_1-Y_0$ and $Y_0-Y_{-1}$ is $-1$ and the estimated correlation are
\begin{center}
\textbf{$N=30$}: $-0.845$ (95\%CI: $[-1,0]$),\\\vspace{0.1cm}
\textbf{$N=100$}: $-0.936$ (95\%CI: $[-1,-0.661]$),\\\vspace{0.1cm}
\textbf{$N=10000$}: $-0.990$ (95\%CI: $[-1,-0.968]$).
\end{center}
All means of the estimators are close to the ground truth. 
However, estimators for small sample sizes have large 95 $\%$ CIs.









%%%%%%%%%%%%%%


{\bf Simulation for the central moments of causal effects.}
%Next, we perform experiments to illustrate finite-sample properties of the estimator.
We assume the following SCM:
\begin{equation}
Y:=-(X+1)U, X \sim \text{Bern}(0.5), U\sim \text{Unif}(0,1),
\end{equation}
where $\text{Bern}(0.5)$ is a Bernoulli distribution with probability $0.5$, and $\text{Unif}(0,1)$ is a uniform distribution of $[0,1]$.
This setting satisfies Assumptions \ref{ASEXO2} and \ref{MONO2}.
The domain of $Y$ is bounded within $[-2,0]$.
%The central moments of the causal effects $\overline{\mu}^{(m)}$ are equal to $\mathbb{E}[(-U+\mathbb{E}[U])^m]$ for $m=1,\dots$.
We simulate 1000 times with the sample size $N=20,100,10000$, respectively.
We let $N_1$, $N_2$, $N_3$, and $N_4$ be all 100.



{\bf Results (Ours).}
We present the estimates obtained using our proposed method.
The ground truth of the variance of $Y_1-Y_0$ is $0.083$, and the estimates of variance are
\begin{center}
\textbf{$N=20$}:\, \, \, \,  $0.107$ (95\%CI: $[0.000,0.304]$),\\\vspace{0.1cm}
\textbf{$N=100$}:\, \, \,  $0.088$ (95\%CI: $[0.017,0.191]$),\\\vspace{0.1cm}
%\textbf{$N=1000$}: $ 0.08520678$ (95\%CI: $[0.03706546,0.1485271]$),
\textbf{$N=10000$}: $0.083$ (95\%CI: $[0.016,0.173]$).
\end{center}
The ground truth of the skewness of $Y_1-Y_0$ is $0$, and the estimates of skewness are
\begin{center}
\textbf{$N=20$}:\, \, \, \, \, \, \, $0.250$ (95\%CI: $[-4.847,6.86]$),\\\vspace{0.1cm}
\textbf{$N=100$}:\, \, \, $-0.340$ (95\%CI: $[-4.622,4.278]$),\\\vspace{0.1cm}
%\textbf{$N=1000$}: $-0.02439405$ (95\%CI: $[-0.3006143,0.2382194]$),
\textbf{$N=10000$}: $-0.212$ (95\%CI: $[-3.691,3.386]$).
\end{center}
The ground truth of the kurtosis of $Y_1-Y_0$ is $1.8$, and the estimates of kurtosis are
\begin{center}
\textbf{$N=20$}:\, \, \, \, $2.245$ (95\%CI: $[0.000,29.601]$),\\\vspace{0.1cm}
\textbf{$N=100$}:\, \, \,  $2.076$ (95\%CI: $[0.000,18.079]$),\\\vspace{0.1cm}
\textbf{$N=10000$}: $1.915$ (95\%CI: $[0.000,17.389]$).
\end{center}
All means of the estimators are close to the ground truth. 
However, estimators for small sample sizes have large 95 $\%$ CIs, and they show slow convergence to the ground truth from the point of view of the 95 \% CIs, especially for high-order moments.



{\bf Results (\citep{Heckman1997}).}
We present the estimates obtained using the method of \citep{Heckman1997}.
%The ground truth of the variance of $Y_1-Y_0$ is $0.083$, and 
The estimates of variance are
\begin{center}
\textbf{$N=20$}:\, \, \, $0.104$ (95\%CI: $[0.019,0.233]$),\\\vspace{0.1cm}
\textbf{$N=100$}:\, \,  $0.083$ (95\%CI: $[0.049,0.132]$),\\\vspace{0.1cm}
\textbf{$N=10000$}: $0.84$ (95\%CI: $[0.079,0.088]$).
\end{center}
%The ground truth of the skewness of $Y_1-Y_0$ is $0$, and 
The estimates of skewness are
\begin{center}
\textbf{$N=20$}:\, \,  \, $-0.078$ (95\%CI: $[-1.451,1.428]$),\\\vspace{0.1cm}
\textbf{$N=100$}:\, \, \,  $0.017$ (95\%CI: $[-0.954,0.883]$),\\\vspace{0.1cm}
\textbf{$N=10000$}: $0.003$ (95\%CI: $[-0.099,0.084]$).
\end{center}
%The ground truth of the kurtosis of $Y_1-Y_0$ is $1.8$, and 
The estimates of kurtosis are
\begin{center}
\textbf{$N=20$}:\,  \, \, \, $2.320$ (95\%CI: $[1.295,4.130]$),\\\vspace{0.1cm}
\textbf{$N=100$}:\, \, \,  $2.069$ (95\%CI: $[1.447,3.552]$),\\\vspace{0.1cm}
\textbf{$N=10000$}: $1.801$ (95\%CI: $[1.733,1.872]$).
\end{center}
The estimators proposed by \citep{Heckman1997} are more efficient than ours. 


However, their estimators are not applicable to a discrete outcome and cannot be used to compute bounds on the moments of causal effects.
We provide further details on the bounds of the moments of causal effects in Appendix \ref{appF1} and present additional experiments for a discrete outcome in Appendix \ref{appF2}.


{\bf Simulation for the central product moments of causal effects.}
We assume the following SCM:
\begin{equation}
Y:=X^2U, X \sim \text{Bern}(0.5), U\sim \text{Unif}(0,1).
\end{equation}
The domain of $Y$ is bounded within $[0,1]$.
%The covariance of the causal effect $\overline{\rho}_{i,j;k,h}$ is equal to $\mathbb{E}[-(U-\mathbb{E}[U])^2]$.
%The variances of the causal effects $\mathbb{E}[\{(Y_1-Y_0)-(\mathbb{E}[Y_1]-\mathbb{E}[Y_0])\}^2]$ and $\mathbb{E}[\{(Y_0-Y_{-1})-(\mathbb{E}[Y_0]-\mathbb{E}[Y_{-1}])\}^2]$ are equal to $\mathbb{E}[(-U+\mathbb{E}[U])^2]$.
This setting satisfies Assumptions \ref{ASEXO2} and \ref{MONO2}.
We simulate 1000 times with the sample size $N=30,100,10000$, respectively.
We let $N_1$ and $N_2$ be all 100.
\citet{Heckman1997} did not study the product moments of causal effects.


{\bf Results (Ours).}
The ground truth of the covariance of $Y_1-Y_0$ and $Y_0-Y_{-1}$ is $-0.083$, and the estimates of covariance are
\begin{center}
\textbf{$N=30$}:\, \, \, \, \, \, $-0.068$ (95\%CI: $[-0.173,0.000]$),\\\vspace{0.1cm}
\textbf{$N=100$}:\, \, \,  $-0.079$ (95\%CI: $[-0.175,-0.011]$),\\\vspace{0.1cm}
\textbf{$N=10000$}: $-0.083$ (95\%CI: $[-0.174,-0.015]$).
\end{center}
The ground truth of the correlation of $Y_1-Y_0$ and $Y_0-Y_{-1}$ is $-1$, and the estimates of correlation are
\begin{center}
\textbf{$N=30$}:\, \, \, \, \, \,   $-0.845$ (95\%CI: $[-1.000,0.000]$),\\\vspace{0.1cm}
\textbf{$N=100$}:\, \, \,  $-0.936$ (95\%CI: $[-1.000,-0.661]$),\\\vspace{0.1cm}
\textbf{$N=10000$}: $-0.990$ (95\%CI: $[-1.000,-0.968]$).
\end{center}
All means of the estimators are close to the ground truth. 
However, estimators for small sample sizes have large 95 $\%$ CIs.
We present additional experiments for a discrete outcome in Appendix \ref{appF2}.


