\documentclass{uai2025} % for initial submission
%\documentclass[accepted]{uai2025} % after acceptance, for a revised version; 
% also before submission to see how the non-anonymous paper would look like 
       

\newcommand{\removed}[1]{}
\usepackage{times}
\usepackage{soul}
\usepackage{url}
%\usepackage{hyperref}
\usepackage[utf8]{inputenc}
%\usepackage[small]{caption}
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{amsthm}
\usepackage{booktabs}
\usepackage{algorithm}
\usepackage{algorithmic}
%\usepackage[switch]{lineno}
\usepackage{stackengine}
\def\defeq{\mathrel{\ensurestackMath{\stackon[1pt]{=}{\scriptscriptstyle\Delta}}}}


\usepackage{algorithm}
\usepackage{algorithmic}

% Set the typeface to Times Roman
\usepackage{times}

%\usepackage{hyperref}
\usepackage{url}

\usepackage{amsmath}
%\usepackage{wrapfig,lipsum,booktabs}

\usepackage{amssymb}
\usepackage{mathtools}
\usepackage{amsthm}


\usepackage{algorithmic}

\usepackage{lscape}
% if yo {\boldsymbol u} use cleveref..
\usepackage[capitalize,noabbrev]{cleveref}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% THEOREMS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\theoremstyle{plain}

% Todonotes is  during development; simply uncomment the next line
%    and comment out the line below the next line to turn off comments
%\usepackage[disable,textsize=tiny]{todonotes}
\usepackage[textsize=tiny]{todonotes}
\usepackage{multirow}

\usepackage{ascmac}
%\usepackage{fancybx}
\usepackage{float}
\usepackage{perpage}
\MakeSorted{figure}
\MakeSorted{table}

\usepackage{url}
\usepackage{natbib}
\usepackage{chapterbib}

\usepackage{color}
\usepackage{tikz}
\tikzset{%
mynode/.style={circle,minimum width=.5ex, fill=none,draw}, % no filling
myfillnode/.style={circle,minimum width=.5ex, fill=lightgray,draw}, % fill with black
}
\usepackage{amssymb}
\usepackage{natbib}

\newcommand{\0}{$\mathrm{I}$}
\newcommand{\2}{$\mathrm{I}\hspace{-1.2pt}\mathrm{I}$}
\newcommand{\3}{$\mathrm{I}\hspace{-1.2pt}\mathrm{I}\hspace{-1.2pt}\mathrm{I}$}
\newcommand{\4}{$\mathrm{I}\hspace{-1.2pt}\mathrm{V}$}
%\newcommand{\3}{$\mathrm{i}$}
%\newcommand{\4}{$\mathrm{i}\hspace{-0.8pt}\mathrm{i}$}
%\newcommand{\5}{$\mathrm{i}\hspace{-0.8pt}\mathrm{i}\hspace{-0.8pt}\mathrm{i}$}
\newcommand{\6}{$\mathrm{i}\hspace{-0.8pt}\mathrm{v}$}
\newcommand{\indep}{\perp \!\!\! \perp}
\usepackage{amsmath}               
\usepackage{lscape}
\usepackage{algorithm}
%\usepackage[dvipdfmx]{graphicx}
%\bibliographystyle{unsrtnat}
%\DeclareMathOperator*{\argmin}{arg\,min}
%\DeclareMathOperator*{\argmax}{arg\,max}
\usepackage{color}
\usepackage{tikz}
% The \icmltitle yo {\boldsymbol u} define below is probably too long as a header.
% Therefore, a short form for the running title is supplied here:
\usepackage{amsmath,amsthm}
\newtheorem{theorem}{Theorem}
\newtheorem{definition}{Definition}
\newtheorem{assumption}{Assumption}
\newtheorem{lemma}{Lemma}
\newtheorem{proposition}{Proposition}
\newtheorem{corollary}{Corollary}
\usepackage{multirow}
\usepackage{comment}
\usepackage{here}
\allowdisplaybreaks[4]
%\usepackage{bbm}
\usepackage{caption}
\usepackage{bbding}
\usepackage{arydshln}
\usepackage{afterpage}

%\usepackage{algpseudocode}
\usepackage{mathrsfs}
\DeclareMathOperator*{\plim}{p-lim}

\newcommand{\jin}[1]{\textcolor{blue}{[[#1]]}}
\newcommand{\jina}[1]{\textcolor{blue}{#1}}
\newcommand{\yuta}[1]{\textcolor{red}{#1}}
\newcommand{\error}[1]{\textcolor{green}{#1}}
\usepackage{soul}


% If accepted, instead use the following line for the camera-ready submission:
%\usepackage[accepted]{icml2024}

% For theorems and such
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{mathtools}
\usepackage{amsthm}
                 
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2024} % ptmx math instead of Computer
                                         % Modern (has noticeable issues)
% \documentclass[mathfont=newtx]{uai2024} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)

%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{Moments of Causal Effects}

% The standard author block has changed for UAI 2024 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1]{\href{mailto:<jj@example.edu>?Subject=Your UAI 2024 paper}{Jane~J.~von~O'L\'opez}{}}
\author[1]{Harry~Q.~Bovik}
\author[1,2]{Further~Coauthor}
\author[3]{Further~Coauthor}
\author[1]{Further~Coauthor}
\author[3]{Further~Coauthor}
\author[3,1]{Further~Coauthor}
% Add affiliations after the authors
\affil[1]{%
    Computer Science Dept.\\
    Cranberry University\\
    Pittsburgh, Pennsylvania, USA
}
\affil[2]{%
    Second Affiliation\\
    Address\\
    …
}
\affil[3]{%
    Another Affiliation\\
    Address\\
    …
  }
  
\begin{document}



Thank you for your valuable feedback. We hope that our responses in the following adequately address your concerns and lead to a positive reassessment of our paper.

>Comment:
The identifiability results seems to be incremental in comparison to the work of Kawakami et al. [2024a]. 
More specifically, the integral (5) and theorem 1 in this paper follow directly from Eq (12) and theorems given by Kawakami et al. [2024a].
Similarly, can be obtained the results regarding identifiability of $E[(Y_i - Y_j)(Y_k - Y_h)]$.

Our response:
(i) There might be a misunderstanding of Eq. (12) of Kawakami et al. [2024a] and Eq. (5) in this paper. They are completely different equations and are not related. Eq. (5) in this paper expresses the moments of causal effects, $E[(Y_1-Y_0)^m]$, in a new form and is ``derived'' by first deriving Lemma 1. Eq. (12) of Kawakami et al. [2024a] merely states that $PNS(y; x_0,x_1)=P(Y_{x_0}<y\leq Y_{x_1})$ can be computed through conditional PNS $PNS(y; x_0,x_1,c)=P(Y_{x_0}<y\leq Y_{x_1}|C=c)$ as $PNS(y; x_0,x_1)=\int PNS(y; x_0,x_1,c)p(c)dc$.

(ii) The identification result Theorem 1 follows from Lemma 1, which derivation is nontrivial, and that the joint distributions of potential outcomes in the form of $\mathbb{P}(Y_0<y_1\leq Y_1,Y_0<y_2\leq Y_1,\dots,Y_0<y_m\leq Y_1)$ are identifiable, which is by Theorem 5.2 of Kawakami et al. [2024a]. Similarly, the identification result Theorem 3 for $E[(Y_i - Y_j)(Y_k - Y_h)]$ follows from Lemma 3 in this paper and  Theorem 5.2 of Kawakami et al. [2024a].

(iii) In summary, the identification results in this paper use a result (Theorem 5.2) by Kawakami et al. [2024a] that the joint distributions of potential outcomes in the form of $\mathbb{P}(Y_0<y_1\leq Y_1,Y_0<y_2\leq Y_1,\dots,Y_0<y_m\leq Y_1)$ or  $\mathbb{P}(Y_j<y_1\leq Y_i,Y_h<y_2\leq Y_k)$ are identifiable. Beyond using this mathematical result, the topic of this paper is fundamentally different from that of  Kawakami et al. [2024a]. The target of Kawakami et al. [2024a] is the probabilities of causation (PoC), which are expressed as the probabilities of certain counterfactual events related to the necessity and sufficiency of the treatment. In contrast, the target of this paper is the moments of causal effects, which are statistical quantities related to the shape of the causal effect distribution. 

>Comment:
Although the authors provide a bound for the moments of treatment causal effect, but it is not clear how tight it is. 


Our response:
We will add the following after Theorem 2:

"The upper bound of the Fréchet inequalities is always sharp [1]; thus, the function $u$ in Lemma 2 is sharp for all $m$.
In contrast, the lower bound of the Fréchet inequalities is not always sharp  except when $m = 1$; hence, the function $l$ in Lemma 2 is not sharp. 
As a result, only the upper bounds of the moments of causal effects are sharp when $m$ is even.
In all other cases, our bounds of the moments of causal effects are not sharp."

We will add the following sentence after Theorem 4:

"The bounds for product moments are not sharp."

We will add the following sentence in the conclusion: 

"Deriving tighter  bounds than those provided by the Fréchet inequalities remains a highly challenging open mathematical problem when $m \geq 2$. Some studies [2,3] provide improved Fréchet–Hoeffding bounds by incorporating additional information.
This will be a future work." 


[1] Nelsen, Roger B. An introduction to copulas. springer, 2006.

[2] Lux, Thibaut, and Antonis Papapantoleon. "Improved Fréchet–Hoeffding bounds on d-copulas and applications in model-free finance." (2017): 3633-3671.

[3] Bartl, Daniel, et al. "Marginal and dependence uncertainty: bounds, optimal transport, and sharpness." arXiv preprint arXiv:1709.00641 (2017).

>Comment:
How it (the bounds) would behave when the number of samples increase?


Our response:
The theoretical bounds are given in terms of probabilities rather than empirical estimates. The empirical estimates of the bounds will become more reliable when the number of samples increases (however, the theoretical bounds  do not become tighter when the number of samples increases). For example, in Table 1, the confidence intervals of the estimated upper and lower bounds ($\sigma_U$ and $\sigma_L$) become narrower as the number of samples increases. 
Focusing on the upper confidence bound of the upper bound $\sigma_U$ and the lower confidence bound of the lower bound $\sigma_L$, these intervals become tighter as the sample size increases.


>Comment:
The work lacks of novelty and seems for me to be incremental given the results of Kawakami et al. [2024a].

Our response:
The main contribution of this paper is providing identification and  bounds for the higher moments (or product moments) of causal effects, whereas most existing work focuses on ``average'' causal effects like ACE or CACE. These moments serve as "statistical measures" that characterize the shape of the causal effect distribution. The results provide novel tools for researchers to study the heterogeneity of causal effects to gain a deeper understanding of how causal effects differ across individuals.


On the other hand, Kawakami et al. [2024a] studied the probabilities of causation (PoC).
PoC are a family of "probabilistic measures" quantifying whether one event was the real cause of another in a given scenario. The subject of Kawakami et al. [2024a] is completely different from that of this work. While the identification results of  this work leverage a specific mathematical result by Kawakami et al. [2024a] that the joint distributions of potential outcomes in the form of $\mathbb{P}(Y_0<y_1\leq Y_1,Y_0<y_2\leq Y_1,\dots,Y_0<y_m\leq Y_1)$ or  $\mathbb{P}(Y_j<y_1\leq Y_i,Y_h<y_2\leq Y_k)$ are identifiable, it is not obvious that $E[(Y_1-Y_0)^m]$ or $E[(Y_i - Y_j)(Y_k - Y_h)]$ is identifiable by the above result - the derivations of the identification results involve important non-trivial intermediate steps, namely Lemma 1 and Lemma 3. Overall, this work makes  substantial novel contributions beyond merely applying existing results.









%%%%%%%%%%

We appreciate the reviewer's positive reassessment of our work.

>Comment:
Also, I am still not convinced that the given bound for the causal effect is significant. My main concern is what are conditions on the probabilities when the inequalities are tight and how accuracy is affected when these conditions do not hold. Similar questions were raised by other reviewers as well.

Our response:
How to derive tight bounds for these forms of counterfactual probabilities is still an open problem. We are not aware of any tools to derive tighter bounds than those provided by the Fréchet inequalities. 

>Comment:
I am still concerned regarding the significance of the paper results.

Our response:
We believe the significance of the paper's results lies in providing novel tools for researchers to study the heterogeneity of causal effects by identifying and bounding higher moments. This contrasts with most existing work that focuses on average causal effects (ACE) across the population or conditional average causal effects (CACE), which only capture heterogeneity across subpopulations defined by observed covariates.






%Thank you for your reassessment.

%>Comment:
%Thank you for the clarification. I agree that the problem considered by Kawakami et al. [2024a] is different and that one of the main contributions of this work is Lemmas 1 and 3, for which further the results of Kawakami et al. [2024a] were used.  However, the results of Lemma 1 or 3 are straightforward. For instance, Lemma 1 just rewrites $(a-b)^n$ in terms of the integral over indicator function $\int I(a<x_1<b, ...,a<x_n<b) d x_1... d x_n$, where the integral obviously corresponds to the volume of the high-dimensional cube. 

%Our response:
%Yes, the representations in Lemma 1 and Lemma 3, which involve the volume of a high-dimensional cube, are a core idea of our paper. We connected ideas from the PoC literature with the moments of causal effects through Lemma 1 and Lemma 3.

%When $Y_0$ and $Y_1$ share the same distribution, the bounds reduce to a single point at 0. In this case, all $\sigma^{(m)}$, $\sigma^{(m)}_L$, and $\sigma^{(m)}_U$ are equal to 0 for any $m$.This corresponds to a homogeneous case where the causal effect is null for all individuals. As the distributions of $Y_0$ and $Y_1$ differ more significantly, reflecting increased heterogeneity in causal effects, the bounds tend to widen.

%\jin{I don't see what point you are trying to make with the above response. I'll just respond with the following if you have nothing better to say about how to get tighter bounds. }
%\yuta{[Sure]}
%\jin{I don't follow what questions the reviewer is asking. The reviwer may have a misunderstanding about bounds.}





%As described in our response to the other reviewers, we will add the following to the Introduction section to further clarify the importance of understanding the shape of the distribution of causal effects:

%"The shape of the distribution of causal effects uncovers causal effect heterogeneity, which is an actively researched topic in the field of statistics, causal inference, and machine learning. Causal effect heterogeneity refers to the variation in causal effects across individuals or subgroups within a population. Existing work on causal effect heterogeneity mainly examines the conditional average causal effects (CACE), $E[Y_1-Y_0|W]$, based on subjects’ covariates $W$. However, CACE captures only the heterogeneity across subpopulations specified by observed covariates $W$, not the heterogeneity across individuals. In contrast, the shape of the distribution of causal effects reveals the heterogeneity of causal effects across individuals and provides complementary information to CACE."





%The situation in which the function $u$ in Lemma 2 is sharp for all $m$ corresponds to the case where the potential outcomes $Y_0$ and $Y_1$ are "countercomonotonic".
%Countercomonotonicity describes the opposite dependence structure to monotonicity.

%(Monotonicity over $f_Y$)
%"The function $f_Y(x,U_Y)$ is either (i) monotonic increasing on $U_Y$ for all $x \in \{0,1\}$ almost surely w.r.t. $\mathbb{P}_{U_Y}$, or (ii) monotonic decreasing on $U_Y$ for all $x \in \{0,1\}$ almost surely w.r.t. $\mathbb{P}_{U_Y}$."

%(Countercomonotonicity over $f_Y$)
%"The function $f_Y(1,U_Y)$ and $f_Y(0,U_Y)$ satisfy one of the following two conditions:
%(i) $f_Y(1,U_Y)$ is monotonic increasing on $U_Y$ and $f_Y(1,U_Y)$ is monotonic decreasing on $U_Y$ almost surely w.r.t. $\mathbb{P}_{U_Y}$,
%or (ii) $f_Y(1,U_Y)$ is monotonic decreasing on $U_Y$ and $f_Y(1,U_Y)$ is monotonic increasing on $U_Y$ almost surely w.r.t. $\mathbb{P}_{U_Y}$."

%When $Y_1$ and $Y_0$ are continuous variables, the monotonicity represents $Cor(Y_1,Y_0)=1$ and the countercomonotonicity represents $Cor(Y_1,Y_0)=-1$.






%PoC do not provide insights into the shape of the causal effect distribution. Thus, our results offer new insights into the shape of the causal effect distribution that have not been revealed through PoC.


\end{document}

%First, Eq. (5) in our paper and Eq. (12) given by Kawakami et al. [2024a] are completely different equations. Eq. (12) given by Kawakami et al. [2024a] states that the probability of necessity and sufficiency (PNS), $PNS(y; x_0,x_1)=P(Y_{x_0}<y\leq Y_{x_1})$, can be computed through conditional PNS $PNS(y; x_0,x_1,c)=P(Y_{x_0}<y\leq Y_{x_1}|C=c)$ as $PNS(y; x_0,x_1)=\int PNS(y; x_0,x_1,c)p(c)dc$. Eq. (5) in our paper provides a new representation of the moments of causal effects, $E[(Y_1-Y_0)^m]$. You may have misunderstood Eq. (12) given by Kawakami et al. [2024a] or Eq. (5) in our paper. Therefore, our theorems are not merely incremental extensions of those presented by Kawakami et al. [2024a], as they are fundamentally based on our novel representation of the moments of causal effects. Similarly, the identifiability of $E[(Y_i-Y_j)(Y_k-Y_h)]$ is also a novel contribution.

%Next, we emphasize that the objectives of Kawakami et al. [2024a] and those of our paper are fundamentally different. The target of Kawakami et al. [2024a] is the probabilities of causation (PoC), which are expressed as the probabilities of specific counterfactual events related to the necessity and sufficiency of the treatment. In contrast, the target of our paper is the moments of causal effects, which are statistical quantities related to the shape of the causal effect distribution. Therefore, the ideas presented by Kawakami et al. [2024a] are not directly related to those developed in our paper, even though they may appear mathematically similar.


%Our bounds presented in the paper are independent of the sample size, as they are derived based on probabilities rather than empirical estimates. Empirical estimates based on large sample sizes yield accurate evaluations of the bounds; however, the bounds themselves do not become tighter , regardless of the sample size.


%Our contribution is to provide the identification, bounds, and estimation methods of the moments (or product moments) of causal effects. These moments serve as "statistical measures" that characterize the shape of the causal effect distribution.
