\documentclass{article}
\usepackage[margin=1.2in]{geometry}
\usepackage[utf8]{inputenc}

 
\usepackage{graphicx} % Required for inserting images
\usepackage{microtype}
\usepackage{subcaption}
\usepackage{booktabs} % for professional tables
\usepackage{multirow}
\usepackage{multicol}

\usepackage{hyperref}
\usepackage{algorithm}
\usepackage[noend]{algorithmic}
\usepackage{pifont}
\usepackage{multicol}
% \usepackage{algorithmicx}
% For theorems and such
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{mathtools}
\usepackage{amsthm}
\usepackage{wrapfig}
\usepackage{dsfont}
% if you use cleveref..
\usepackage[capitalize, noabbrev]{cleveref}
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% THEOREMS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\theoremstyle{plain}
\newtheorem{theorem}{Theorem}[section]
\newtheorem{example}[theorem]{Example}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{corollary}[theorem]{Corollary}
\theoremstyle{definition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{assumption}[]{Assumption}
\theoremstyle{remark}
\newtheorem{remark}[theorem]{Remark}

% Todonotes is useful during development; simply uncomment the next line
%    and comment out the line below the next line to turn off comments
%\usepackage[disable,textsize=tiny]{todonotes}
\usepackage[textsize=tiny]{todonotes}


%---Commands----
\newcommand{\xx}{\mathbf{x}}
\newcommand{\yy}{\mathbf{y}}
\newcommand{\LL}{\mathcal{L}}
\newcommand{\BB}{\mathcal{B}}
\newcommand{\GG}{\mathcal{G}}
\newcommand{\HH}{\mathcal{H}}
\newcommand{\DD}{\mathcal{D}}
\newcommand{\MM}{\mathcal{M}}
\newcommand{\FF}{\mathcal{F}}
\newcommand{\A}{\mathcal{A}}
\newcommand{\CC}{\mathcal{C}}
\newcommand{\RR}{\mathcal{R}}
\newcommand{\TT}{\mathcal{T}}
\newcommand{\I}{\mathcal{I}}
\newcommand{\J}{\mathcal{J}}
\newcommand{\id}{\mathrm{I}}
\newcommand{\PP}{\mathcal{P}}

\DeclareMathOperator{\NGASS}{NG}

\newcommand{\Z}{\mathbb{Z}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\C}{\mathbb{C}}
\newcommand{\prob}{\mathbb{P}}
\newcommand{\pp}{\mathbf{P}}
\newcommand{\toric}{\mathsf{toric}}
\newcommand{\evec}{\begin{pmatrix}\varepsilon_1\\\varepsilon_2\\\varepsilon_3\end{pmatrix}}
\newcommand{\xvec}{\begin{pmatrix}X_1\\X_2\\X_3\end{pmatrix}}

\newcommand{\mB}{\mathbf{B}}
\newcommand{\tmB}{\Tilde{\mathbf{B}}}
\newcommand{\mA}{\mathbf{A}}
\newcommand{\tmA}{\Tilde{\mathbf{A}}}

\newcommand{\indep}{\perp \!\!\! \perp}
\newcommand{\dsep}{\perp \!}
\usepackage{paralist}


\DeclareMathOperator{\loc}{local}
\DeclareMathOperator{\glo}{global}
\DeclareMathOperator{\rank}{rank}
\DeclareMathOperator{\pred}{pred}
\DeclareMathOperator{\image}{Im}
\DeclareMathOperator{\doop}{do}
\newcommand{\minus}{\scalebox{0.75}[1.0]{$-$}}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator{\Sym}{Sym}
%\DeclareMathOperator{\pa}{pa}
%\DeclareMathOperator{\de}{de}
%\DeclareMathOperator{\an}{an}
%\DeclareMathOperator{\nd}{nd}

% \SetKwInput{KwInput}{Input}
% \SetKwInput{KwOutput}{Output}

\newcommand\independent{\protect\mathpalette{\protect\independenT}{\perp}}
\def\independenT#1#2{\mathrel{\rlap{$#1#2$}\mkern2mu{#1#2}}}

\newcommand*{\dependent}{\centernot{\independent}}

\def\newop#1{\expandafter\def\csname #1\endcsname{\mathop{\rm
#1}\nolimits}}

\newop{Inv}
\newop{conv}
\newop{pa}
\newop{de}
\newop{nd}
\newop{GL}
\newop{O}
\newop{ch}
\newop{CS}
\newop{diag}
\newop{Var}
\newop{top}
\newop{sib}
\newop{an}


\title{Rebuttal}
\author{}
\date{}

\begin{document}
\maketitle
\section{Reviewer 1}
\begin{itemize}
    \item \textbf{Q1 Summary And Contributions:}
    This paper addresses a causal inference problem: how to estimate the causal effect of a treatment variable on an outcome in the presence of a latent confounder. The authors utilize multi-environment data to propose new identification conditions, proving that when the target causal effect remains invariant across environments, causal effect identification can be achieved under specific conditions. The main contribution of the paper lies in proposing a moment-based estimation algorithm that performs well when only a single parameter in the data-generating mechanism (whether it be the exogenous noise distribution or the causal relationship between variables) varies across environments; simultaneously, the authors rigorously prove that identifiability is lost when the exogenous noise distributions of both the latent and treatment variables vary across environments. The paper validates the effectiveness of the proposed methods through experiments on synthetic data.
    \item \textbf{Q3 Main Strengths:}
    \begin{itemize}
        \item This paper decomposes the univariate case into four scenarios, proposing a specific identification algorithm for each. It utilizes the properties of higher-order moments and independence assumptions, with strong theoretical support and rigorous proof processes.
        \item With larger sample sizes, the identification performance of the proposed method is superior to comparative methods and demonstrates greater stability.
    \end{itemize}
    \item \textbf{Q4 Main Weakness:}
    \begin{itemize}
        \item In all experiments, under small sample conditions, the method proposed in this paper is less stable than the comparative methods.
        \item In the experiments with Gumbel distribution, in case 4, the performance is significantly inferior to the comparative methods.
        \item For case 2, the identifiability proof requires the critical additional assumption that the error term $\epsilon_t$ follows a non-Gaussian distribution.
    \end{itemize}
    \item \textbf{Q5 Detailed Comments To The Authors:} See the weaknesses above.
    \item \textbf{Q7 Justification For Your Score:} See the weaknesses above.
\end{itemize}

\subsection{Answer}
{
\color{blue}
We thank the reviewer for their detailed comments and helpful suggestions.
}

\begin{itemize}
    \item "In all experiments, under small sample conditions, the method proposed in this paper is less stable than the comparative methods."

    {\color{blue}
    Indeed, under small sample sizes, our estimation algorithms tend to be less stable than the comparative methods, as our algorithms rely on unbiased estimates of higher-order moments (third order or higher). In contrast, linear regression solutions can be expressed in terms of covariances and variances of the observed data, which typically have lower sample variance. However, it is important to note that our proposed algorithms consistently converge to the true value as the sample size increases—that is, they are consistent estimators—whereas the comparative methods exhibit systematic bias regardless of sample size. Furthermore, even with limited data, our less stable estimate achieves better relative accuracy than the biased estimates produced by other methods.
    
    %Indeed, under a small sample size, our estimation algorithms are less stable than comparative methods because they use unbiased estimates of high-order moments (order 3 or higher). In contrast, the solution of the linear regression can be expressed in terms of covariances/variances of the observed data, having lower sample variance. It is noteworthy that, from the experiments, we observe that our proposed algorithms converge to the true value as the sample size increases) (i.e., they are consistent estimators) while comparative methods have a systematic bias regardless of the number of samples. Moreover, even with a small sample size, our less stable estimate still has better relative accuracy than the biased estimate obtained by other methods.
    }
    \item "In the experiments with Gumbel distribution, in case 4, the performance is significantly inferior to the comparative methods."
    {\color{blue}
    %It is a really interesting observation raised by the reviewer. 
    There is a typo in the caption of Figure 6 and it should be a Logistic distribution instead of Gumbel distribution. As the reviewer pointed out, for the case when $\gamma$ is different across the environments, our estimation algorithm underperforms given the first three error boxes. Importantly, as seen in Figure 15 (d), the value of $n$ is not recovered correctly for this distribution when the sample size is not large enough. Therefore, Algorithm 4 uses an incorrect estimation formula to compute the target causal effect. In contrast, when sample size increases, the correct value of $n$ is recovered and the next error boxes show that our estimate is more accurate and unbiased unlike the one obtained by comparative methods. This can be checked if the y-axis of Figure 6(c) is restricted by (-1, 1) given our code (https://anonymous.4open.science/r/IdentificationMultipleDomain/README.md). We will add this discussion and these adjustments to the appendix in the final version. Note that for all other distributions/experiments, it is clearly seen that our algorithm outperforms the comparative methods.
    
    It is noteworthy that $n$ is often known in similar literature of causal effect estimation via high-order moments (e.g. [1]), and to the best of our knowledge, our work is the first one that does not assume it in the experiments.
    }

    \item "For case 2, the identifiability proof requires the critical additional assumption that the error term $\epsilon_t$ follows a non-Gaussian distribution."

    {\color{blue}
    The non-Gaussianity assumption is a common and important assumption required in a wide range of literature on causal effect estimation and causal discovery [1, 2, 3, 4, 5, 6]. More specifically, for Gaussian distributions, some causal relationships are indistinguishable from observational data, making non-Gaussianity a crucial requirement for identifiability.
    %, while non-Gaussianity introduces distinctive features like skewness, kurtosis, etc. T
    This issue is well-illustrated by the unidentifiability results of [2, 6] for linear Gaussian models. Note that unlike many other work ([1, 3, 4, 5, 6]),  we do not also require that \textit{all} the exogenous noises are non-Gaussian
    Instead, in 3 out of 4 cases, it suffices that only a specific exogenous noise term is non-Gaussian (for second and third cases the exogenous noise of $T$; for fourth case the exogenous noise of $U$).

    [1] Schkoda, D., Robeva, E., and Drton, M. Causal discovery of linear non-gaussian causal models with unobserved confounding.
    
    [2] Yaroslav Kivva, Saber Salehkaleybar, and Negar Kiyavash. A cross-moment approach for causal effect estimation. Advances in Neural Information Processing Systems, 36, 2024.

    [3] Cai, R., Huang, Z., Chen, W., Hao, Z., and Zhang, K. Causal discovery with latent confounders based on higher-order cumulants. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.

    [4] S. Salehkaleybar, A. Ghassami, N. Kiyavash, and K. Zhang. Learning linear non-gaussian causal models in the presence of latent variables. J. Mach. Learn. Res., 21:39–1, 2020.

    [5] Patrik O Hoyer, Shohei Shimizu, Antti J Kerminen, and Markus Palviainen. Estimation of causal effects using linear non-gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2):362–378, 2008.

    [6] Jan Eriksson and Visa Koivunen. Identifiability, separability, and uniqueness of linear ica models. IEEE Signal Processing Letters, 11(7):601–604, 2004.
    }
    
\end{itemize}


\section{Reviewer 2}

\begin{itemize}
    \item \textbf{Q1 Summary And Contributions:}
    This paper studies the problem of causal effect identification between treatment and outcome variables with a hidden confounder using multiple environments. The authors adopt the framework of linear structural causal models and common structures across environments. They show that, under several assumptions, the causal effect can be identified and further propose an impossibility result.
    \item \textbf{Q3 Main Strengths:}
    The problem addressed in the paper is important and interesting. The results obtained may define a first step from which one can build new results.
    \item \textbf{Q4 Main Weakness:}
    Several assumptions are made by the authors in order to identify results, which may not be realistic. The framework considered is simple, and even too simple I believe.
    \item \textbf{Q5 Detailed Comments To The Authors:} If the overall problem is interesting, the framework adopted here may be simplistic; it furthermore relies on many assumptions, all which may not be realistic. First of all, the causal graph retained does not incorporate observed confounders between the treatment and outcome variables and/or descendants of the hidden confounder different from the treatment variable. These simple extensions are common and should at least discussed.

    Along the same line, it would be nice to have examples of different environments one can consider, and of different environments in which the assumptions made make sense. The example given after Assumption 2 is not really convincing because the effect of a certain medication may indeed depend on the country the medication is taken.
    
    The invariance assumption across environments is also rather strong. In order to identify the causal effect between treatment and outcome variables with five environments, you need to have, unless I am mistaken, all parameters of the different structural equations but one (different form the noise of the outcome variable) to be invariant across all environments. Could you give examples of such cases?
    
    As your results mainly consist in sufficient conditions for identifiability, it would be nice to have examples showing that these conditions are not necessary.
    
    Additional comments:
    \begin{itemize}
        \item Could you give the complexity of computing $\beta$
 in the different cases retained?
        \item here is a typo in Eq. 14, Th. 3.3: D should be replaced by T.
        \item Still in Th. 3.3: can you explain how you use Th.1 of Kiva et al. 2024 to estimate $\gamma$? I didn't have the time to delve on that but I had the impression that Kiva et al. use evidence from an additional variable in their development.
    \end{itemize}
    \item \textbf{Q7 Justification For Your Score:} The framework retained is too simple and based on several assumptions which are not realistic in practice. The impact of the paper is thus limited.
\end{itemize}

\subsection{Answer}
{
\color{blue}
We thank the reviewer for their detailed comments and helpful suggestions.
}

\begin{itemize}
    \item "First of all, the causal graph retained does not incorporate observed confounders between the treatment and outcome variables and/or descendants of the hidden confounder different from the treatment variable. These simple extensions are common and should at least discussed."

    {\color{blue}
    We respectfully disagree with the reviewer’s characterization of the model as simplistic. As shown in [1], the treatment effect is not identifiable from observations in a single environment alone. This necessitates the inclusion of additional information to enable identifiability. For instance, prior work by prominent researchers in the field have considered incorporating instrumental variables or two proxy variables as in [2], which was later relaxed to a single proxy variable in [3].

In contrast, our setting assumes no access to such additional variables. Instead, we leverage observations from multiple environments and establish sufficient conditions under which the causal effect becomes identifiable. Moreover, the complexity of our results is evident from the algorithms and the associated theoretical proofs, which go well beyond a simplistic analysis.

We also thank the reviewer for the suggestion regarding the graphical model extension and will incorporate this discussion into the main text. Specifically, when observed confounders are present, the problem of identifying the treatment effect remains equivalent to the setting we address. This is because, by regressing the treatment and outcome on the observed covariates and working with the residuals, the problem reduces to the case without observed covariates—a similar procedure was done in [3]. Additionally, descendants of hidden confounders that are not the treatment variable can be handled similarly, following the approach in [3].
    %We would like to disagree with the reviewer that the model considered is simplistic. [1] showed that the treatment causal effect is actually not identifiable given observations just from one environment. Therefore we require additional information to enable identification result. For example, it would be enough to have additional information such as instrumental variable, or two additional proxy variables as in [2], which were later relaxed to one proxy variable in [3]. In contrast, we assume that no other information in terms of additional variables is given, but we may have observations from multiple environments. For which we establish sufficient conditions when the causal effect is identifiable. Furthermore, it is clear from the algorithms and the proofs that the results are far from simple. 

    %We thank the reviewer for the suggestions regarding the graph extension and we will add this discussion to the main part. More specifically, in the presence of the observed confounders, the problem of treatment effect identification is still equivalent to the problem we solve in our work. This follows from the fact that if we regress the treatment and outcome on the observed covariates and consider the residuals after the regression then the problem reduces exactly to the case without observed covariates (a similar procedure was done in [3]). Note, that escendants of the hidden confounder different from the treatment variable can be addressed similar to the work of [3].

    [1] S. Salehkaleybar, A. Ghassami, N. Kiyavash, and K. Zhang. Learning linear non-gaussian causal models in the presence of latent variables. J. Mach. Learn. Res., 21:39–1, 2020.

    [2] M. Kuroki and J. Pearl. Measurement bias and effect restoration in causal inference. Biometrika, 101 (2):423–437, 2014.

    [3] Yaroslav Kivva, Saber Salehkaleybar, and Negar Kiyavash. A cross-moment approach for causal effect estimation. Advances in Neural Information Processing Systems, 36, 2024.
    }

    \item "the example given after Assumption 2 is not really convincing because the effect of a certain medication may indeed depend on the country the medication is taken."

    {
    \color{blue}
    Thank you for the suggestion. We will update the example to the one proposed in [4], which considers the effect of sleeping pills on lung disease using electronic health records collected from multiple hospitals. The causal effect of sleeping pills on lung disease is assumed to remain consistent across different hospitals.
    %Thank you for the suggestion. We will change the example to the one proposed in [4]: the effect of sleeping pills on lung disease using electronic health records, collected from multiple hospitals around the world. The different hospitals serve different populations, but the causal mechanism between sleeping pills and lung disease remains the same across hospitals. Let us know please whether you agree with it.

    [4] Shi, Claudia, Victor Veitch, and David M. Blei. "Invariant representation learning for treatment effect estimation." In Uncertainty in artificial intelligence, pp. 1546-1555. PMLR, 2021.
    }
    \item " In order to identify the causal effect between treatment and outcome variables with five environments, you need to have, unless I am mistaken, all parameters of the different structural equations but one (different form the noise of the outcome variable) to be invariant across all environments. Could you give examples of such cases?"

    {\color{blue}
    We believe there may be a misunderstanding. We do not require all but one of the parameters in the structural equations to remain invariant across all five environments. In fact, we only need that among the five environments, there exist two in which exactly one parameter differs (excluding the noise term of the outcome variable). These two environments are sufficient to establish identifiability based on the available observations.
    %We believe that there is may be some misunderstanding. We do not need that all the parameters of the different structural equations but one to be invariant across all 5 environments. Actually, we just need that among these 5 environments there \textbf{exists} just two such that there is only one parameter changes among them (different form the noise of the outcome variable) and then we may use these observations for the identifiability.
    }

    \item "As your results mainly consist in sufficient conditions for identifiability, it would be nice to have examples showing that these conditions are not necessary."

    {
    \color{blue}
    We believe that our conditions are also necessary for identifiability in the sense that, if any one of them is removed, the treatment effect may no longer be uniquely identifiable. However, formally proving this necessity is beyond the scope of our current work and can be explored in future research. 
    %As noted in our response to Reviewer tFfg, the non-Gaussianity assumption is a common and well-established condition in the causal effect identification literature.
    Also note that Non-Gaussianity assumption is a common and important assumption in the literature of causal effect estimation and causal discovery [1, 2, 3, 4, 5, 6]. More specifically, for Gaussian distributions some causal relationships are indistinguishable from observational data. 
    %, while non-Gaussianity introduces distinctive features like skewness, kurtosis, etc. T
    This issue is well illustrated by the non-identifiability results of [2, 6] for linear Gaussian models. Note that unlike many other work ([1, 3, 4, 5, 6]),  we do not also require that \textit{all} the exogenous noises are non-Gaussian
    Instead, in 3 out of 4 cases, it suffices that only a specific exogenous noise term is non-Gaussian (for second and third cases the exogenous noise of $T$; for fourth case the exogenous noise of $U$).
    Additionally, we demonstrated that when two environments differ in more than one parameter, the treatment effect is not uniquely identifiable. This highlights that the condition limiting the number of varying parameters between environments is essential and cannot be lifted. %relaxed.
    %We believe that our conditions are also necessary for the identifiability results in a sense that if we ignore one of the conditions the treatment effect will not be uniquely identifiable. However proving this fact is out of scope of our work and can be considered as a future work. Note that the non-Gaussianity conditions is a common assumption in literature of causal effect identification as it was discussed in the response to the Reviewer tFfg. Additionally, we showed the cases when two environments differ in two parameters and that is treatment causal effect is not identifiable uniquely. This shows that a condition on number of parameters that vary across two environments can not be just lifted. 

    [1] Schkoda, D., Robeva, E., and Drton, M. Causal discovery of linear non-gaussian causal models with unobserved confounding.
    
    [2] Yaroslav Kivva, Saber Salehkaleybar, and Negar Kiyavash. A cross-moment approach for causal effect estimation. Advances in Neural Information Processing Systems, 36, 2024.

    [3] Cai, R., Huang, Z., Chen, W., Hao, Z., and Zhang, K. Causal discovery with latent confounders based on higher-order cumulants. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.

    [4] S. Salehkaleybar, A. Ghassami, N. Kiyavash, and K. Zhang. Learning linear non-gaussian causal models in the presence of latent variables. J. Mach. Learn. Res., 21:39–1, 2020.

    [5] Patrik O Hoyer, Shohei Shimizu, Antti J Kerminen, and Markus Palviainen. Estimation of causal effects using linear non-gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2):362–378, 2008.

    [6] Jan Eriksson and Visa Koivunen. Identifiability, separability, and uniqueness of linear ica models. IEEE Signal Processing Letters, 11(7):601–604, 2004.
    }

    \item "Could you give the complexity of computing $\beta$ in the different cases retained?"

    {
    \color{blue}
    The computational complexity of Algorithms 1, 2, 3, and 4 
    %within each iteration of the while loop
    is linear with respect to the number of samples, as all moment estimation procedures can be performed with complexity $O(N)$, where $N$ is the number of samples and the number of steps done by the loops is independent quantity of $N$. Note, that the total number of iterations of the loops  depends on certain properties of the underlying distributions, which can be categorized into two types:
    %The Algorithms 1, 2, 3 and 4 are linear in terms of the number of samples, because all the moment estimation procedures can be done by $O(N)$ complexity ($N$ - number of samples). However, this in the Algorithms 1, 2, 3 and 4 present some while loops the length of which depend on the properties of case specific distributions. These properties fall into two category:
    \begin{itemize}
        \item  number of loop iteration $\sim$ the order of the first moment that indicates the difference in distributions that is supposed to vary across the environments according the conditions of the theorem (as in the Algorithm 1);
        \item number of loop iteration $\sim$ the smallest $n$ such that the inequality $E[\epsilon^n]\neq (n-1)E[\epsilon^{n-2}]E[\epsilon^2]$ holds; that indicates that distribution is non-Gaussian (as in Algorithm 3 or in procedure \textit{GetRatio}).  
    \end{itemize}
    It is worth noting that in both cases, it is natural to assume that the order of the computed moments are bounded by some small constant (typically less than 10). Therefore, in practice, the overall complexity of the algorithms remains $O(N)$.
    %Note, that for both of these conditions it is natural to assume that these parameters are less than 7 to have these assumptions practically feasible. Therefore, in practice the complexity of all the models would be reduced to O(N), since all the loops would not be no longer than 7 steps.
    }

    \item "Still in Th. 3.3: can you explain how you use Th.1 of Kiva et al. 2024 to estimate $\gamma$?"

    {\color{blue}
    Kivva et al. (2024) proposed an algorithm, \textit{GetRatio}, that provably identifies the ratio $a/b$ from observations $(X_1, X_2)$ generated by the structural causal model (SCM) $X_1 = a\epsilon + \epsilon_1$ and $X_2 = b\epsilon + \epsilon_2$. Notably, the algorithm does not require any additional information, as can be verified from its description in Kivva et al. (2024). Specifically, the variables $D$ and $Z$ in their notation correspond to $X_1$ and $X_2$, respectively, and all quantities computed by the algorithm depend solely on $D$ and $Z$.

In the context of our proof for Theorem 3.3, the variables $X_1$ and $X_2$ correspond to $r_1T^{(1)} - Y^{(1)}$ and $T^{(1)}$, respectively, where $r_1T^{(1)} - Y^{(1)} = -\gamma \epsilon_t^{(1)} + \epsilon_y^{(1)}$ and $T^{(1)} = \epsilon_t^{(1)} + \epsilon_u^{(1)}$. We will include this clarification in the revised version.


    %(Kivva et al. 2024) proposed an algorithm \textit{GetRatio} that provably identifies the ratio $a/b$ from  the observations $(X_1, X_2)$ that follows SCM $X_1 = a\epsilon + \epsilon_1$ and $X_2 = b\epsilon + \epsilon_2$. To do this they do not require additional information, which can be verified from the algorithm itself described in [Kivva et al. 2024]. More specifically, the variables $D$ and $Z$ from  [Kivva et al. 2024] correspond to the variables $X_1$ and $X_2$, respectively, and all the quantities computed in the algorithm are expressed only through the variables $D$ and $Z$. In the context of proof of Theorem 3.3, these variables $X_1$ and $X_2$ correspond to the quantities $r_1T^{(1)}-Y^{(1)}$, $T^{(1)}$ respectively, where $r_1T^{(1)}-Y^{(1)} = -\gamma \epsilon_t^{(1)} + \epsilon_y^{(1)}$ and $T^{(1)} = \epsilon_t^{(1)} + \epsilon_u^{(1)}$. We will add this explanation in the revised version.
    }
    
\end{itemize}

\subsection{Round 2}
Thank you for your answers.

Regardless of whether or not the term simplistic is appropriate to describe your framework, you should not make a confusion between a simplistic analysis and a simplistic framework, and one can indeed have a complex analysis in a simplistic framework.

In your answer, you claim that "this is because, by regressing the treatment and outcome on the observed covariates and working with the residuals, the problem reduces to the case without observed covariates—a similar procedure was done in [3]. Additionally, descendants of hidden confounders that are not the treatment variable can be handled similarly, following the approach in [3]." Can you describe this procedure? I had looked at [3] when reviewing your paper but don't remember having seen this procedure. Does it hold in general or only for linear models with additive noise?

More generally, I still find that there are many assumptions made in your study and I do believe that you should justify them further. Another point related to [3] is that you borrow many (if not most) of your mathematical concepts, assumptions and approaches from this study. I was thus a bit surprised in your answer when you claim that "Moreover, the complexity of our results is evident from the algorithms and the associated theoretical proofs".

"We do not require all but one of the parameters in the structural equations to remain invariant across all five environments. In fact, we only need that among the five environments, there exist two in which exactly one parameter differs (excluding the noise term of the outcome variable). These two environments are sufficient to establish identifiability based on the available observations." Yes, but only for the environment of the pair with observational data and there is no generalisation to the other environments, right? Let us assume one has n=5 environments and that only one is based on experimental data. What are the conditions for the treatment effect to be uniquely identified on the four remaining environments? My claim that you need in this case that all but one of the parameters (excluding the noise of the outcome variable) should be invariant may be too strong, but I believe yours to be too weak.

Thank you for the clarification on complexity and Th. 3.3.

\subsubsection{Answer}
Thank you for your response. As we emphasized in the rebuttal, the setup under consideration—two observed variables with one latent confounder—represents one of the most challenging scenarios for identifying causal effects. Most prior work requires additional information, such as access to instrumental variables or proxy variables, to address this setting. Although the causal structure may appear simple, it is inherently difficult to handle.

The methods we propose can be naturally extended to the scenarios suggested by the reviewer in the context of linear models with additive noise. For the case involving observed covariates, please refer to the proof of Theorem 2 in [3]. Please, make sure to look at the version that is published on NeurIPS 2023 (not the arxiv version).

Regarding the connection to [3], while both works utilize higher-order moments for causal effect identification, the problem settings are fundamentally different. Our work addresses a multi-environment framework, whereas [3] focuses on a single environment. Moreover, our methods are specifically designed to handle changes in parameters across different domains—an aspect not considered in [3] due to the lack of having access to multiple environments, thus the analysis is completely new.

With respect to the assumptions, Assumption 3, that assumes that all moments are finite, is a widely used and standard assumption in the literature of causal discovery using higher-order moments. Moreover, Assumption 4 provides a sufficient condition to ensure that the moments uniquely determine the distribution. Any alternative assumption that offers the same guarantee could be used in its place. Let us know please if there are other assumptions we need to clarify.

Again, in our work we present the sufficient conditions under which the treatment causal effect is uniquely identified given the observations obtained from two environments. That is, given five environments, to identify the causal effect of the treatment in a specific environment we just need to have another environment that varies only in one parameter. This works for any of these 5 environments and can be applied separately to each of them. This is consequently a much weaker assumption than assuming that only one parameter can vary across all the 5 environments. Note that under assumption that treatment effect is invariant in all 5 environments we just need to identify it in any of these 5 environments.  Finally, a solution that utilizes more than two environments altogether is out of the scope of this work, as was mentioned at the end of Section 2.2, and it would be a promising direction for future work.

\section{Reviewer 3}
\begin{itemize}
    \item \textbf{Q1 Summary And Contributions:}
    This paper presents identification criteria for causal effects under a specific causal structure assuming a linear causal model and data from two heterogeneous domains when specific parameters of the causal models are assumed to be constant between the domains.
    \item \textbf{Q3 Main Strengths:}
    The paper provides new identification criteria for causal effects with arguably reasonable assumptions.
    \item \textbf{Q4 Main Weakness:}
    The scope of the presented work is quite limited: only a specific causal structure is considered where X is a direct cause of Y, there is a latent confounder between X and Y, and the causal relationships are assumed linear.
    
    The authors claim to present estimation methods yet all results are in fact identification procedures.

    Materials for reproducing the experiments are not available.
    \item \textbf{Q5 Detailed Comments To The Authors:} he paper is fairly technical. It would be very beneficial for the reader to provide some examples and motivation about the different scenarios. What kind of information could allow us to assume that a parameter is constant between domains? Can the authors provide some examples where we have heterogenous data such that one parameter is different, and where we know that the causal model is as depicted in Figure 1.

    The authors characterization of the related work is somewhat lacking. There is plenty of research on causal transportability that deals with multi-environment settings that does not fall under the strict binary categorization provided by the authors.
    
    The assumptions of the method are clearly stated, but only Assumptions 1--3 are fairly easy to explain in simple terms. Is there a simple explanation for assumption 4 (other than its direct implication of uniqueness)?
    
    Section 3 mentions "mild non-Gaussianity assumptions". Can the authors explain how non-Gaussianity relates to the assumptions of the theorems, and perhaps on an intuitive level, why Gaussianity is a problem for the proposed method? Linear-gaussian SCMs are fairly common, so it seems that this would be an important drawback to discuss further considering the setting.
    
    The authors refer to Algorithms 1--4 as estimation algorithms yet they are presented as identifiability algorithms. In fact, no estimators are provided for any of the considered scenarios. The implicit assumptions seems to be that the theoretical moments are simply replaced by sample moments. In addition, there is no discussion on how to quantify the uncertainty of the obtained "estimates".
    
    The authors mention that the code for the submission is available online, but no link or supplementary material is provided. If I have simply missed it or if the authors can provide the code, then I will update my score regarding reproducibility accordingly.
    
    Minor comments:
    
    Sec 2.1: The sentence containing "a function capturing the causal relationship of a variable to its parents" can be misunderstood such that $f_X$ defines how $X$ affects its parents instead of the opposite.
    
    "impossibility" should just be non-identifiability.
    
    the do(.) notation should be explained.
    
    Should $\phi_n^{(1)}$ after the "and" on line 5 be $\phi_n^{(2)}$?
    
    the while loop starting on line 5 in Algorithm 4 contains the symbols $\phi_n^{(1)}$ and $\phi_n^{(1)}$ yet these are never defined or updated in the loop. Are these the same values as in Step 3 in Section 3.3?
    
    Theorem 3.3 contains the expectation of $\epsilon_t^{(n-2)}$, should it be just $\epsilon_t^{n-2}$ like in Theorems 3.4 and 3.5?
    
    In the experimental results section regarding Figure 3, it should be noted that the sample size is increasing exponentially on the horizontal axis.

    \item \textbf{Q7 Justification For Your Score:} In its current state, the cons heavily outweigh the pros of the paper. While of some theoretical interest, it is hard to imagine scenarios where the presented results could be of practical use. I also find it highly misleading to present identification results as estimation procedures, especially so when reproducibility materials are not provided.
\end{itemize}

\subsection{Answer}
\begin{itemize}
    \item "The authors mention that the code for the submission is available online, but no link or supplementary material is provided"

    {\color{blue}
    There is a hyperlink included in the paper, located in the first paragraph of Section 5 (Experimental Results), under the text “accessible online.” You can verify this by clicking on the phrase “accessible online,” which redirects to the anonymous code repository containing the experiments: https://anonymous.4open.science/r/IdentificationMultipleDomain/README.md.

Please note that when the PDF is opened in a browser, the hyperlink may not be visibly highlighted. However, it should appear clearly when viewed in a standard PDF reader or editor. In the final version of the paper, we will also include an explicit link to the experiments in a footnote for improved visibility.
    %There is a hyperlink that is accessible from the paper and which is located just right above "PRELIMINARIES" section and specified under the words "provided online". You may verify it by clicking on "provided online" and it will redirect you to the anonymous code repository with experiments (https://anonymous.4open.science/r/IdentificationMultipleDomain/README.md). It seems that, if the pdf file is opened in a browser it does not highlight it, however, if it is opened with pdf editor the hyperlink would be easily visible. In a final version, we will add an explicit link for the experiment to the footnote as well.
    }



    \item "What kind of information could allow us to assume that a parameter is constant between domains? Can the authors provide some examples where we have heterogenous data such that one parameter is different, and where we know that the causal model is as depicted in Figure 1."

    {\color{blue}
    We believe our results are particularly useful in applications domains such as economics and healthcare, where environmental variations are useful to model by changes in a single interpretable factor. While the theoretical conditions required by our methods may not always hold in all real-world scenarios, a nice property of our approach is its ability in some cases to assess what parameters vary across the environments from the observational data. Specifically, in the proof of Theorem 3.1, we introduce several testable properties that indicate whether the parameters of the causal model vary across environments. For example, Step 1 of the proof checks whether $\gamma$ remains invariant across environments under mild assumptions. We will highlight this point more clearly in the revised version.
    %We believe that our results will be useful in settings such as genetics and healthcare, where it is possible to model environmental variations through changes in a single interpretable factor. Although the necessary theoretical conditions for our methods might not always hold in practice, we have proposed strategies to verify whether the underlying causal model meets these conditions using the collected data. In particular, in the proof of Theorem 3.1, we propose several properties that may be tested and which suggests whether the parameter is different across the environments. For instance, step 1 in the proof of Theorem 3.1 validates whether $\gamma$ is invariant across the environments under some mild assumptions. 
    
    
    
    %Remarkably, in the real world, the necessary theoretical conditions for most of the existing methods often do not hold; however, scientists decide whether this inconsistency with the theoretical conditions can be ignored, so they may obtain some tentative results via these methods. Moreover in the proof of Theorem 3.1 we propose several properties that may be tested and which suggests whether the parameter is different across the environments. For instance, step 1 in the proof of Theorem 3.1 may validate whether $\gamma$ is invariant across the environments under some mild assumptions.
    }

    \item "There is plenty of research on causal transportability that deals with multi-environment settings that does not fall under the strict binary categorization provided by the authors."

    {
    \color{blue}
    We would like to ask if the reviewer has any specific work in mind from the causal transportability literature. While both our problem and causal transportability problem involve multi-environment settings, the assumptions and objectives are totally different. The main problem in causal transportability is to identify a causal effect in a target domain using experimental data from a source domain and observational data from the target, typically under the assumption that certain mechanisms remain invariant across domains. In contrast, our work aims to identify the causal effect of a treatment on an outcome using data collected from multiple environments, without access to any experimental interventions. Moreover, our results are derived under a specific causal graph structure, for which, to the best of our knowledge, no existing transportability method offers identifiability guarantees.
    %We would like to ask if the reviewer has any specific work in mind from the causal transportability literature? Although in a high level these works may seem to be related, however to the best of our knowledge the conditions and the settings is still quiet different. For example, we are not aware about any work that operates over linear SCMs, or such that can solve the problem 
    %for the graph considered in our paper. 
    }

    \item "only a specific causal structure is considered where X is a direct cause of Y, there is a latent confounder between X and Y"

    {\color{blue}
    As we highlighted in our response to Reviewer QpsT, our setting can be extended to cases when there exist observed confounders of treatment and outcome variables in a straightforward manner. Specifically, when observed confounders are present, the problem of identifying the treatment effect remains equivalent to the setting we address. This is because, by regressing the treatment and outcome on the observed covariates and working with the residuals, the problem reduces to the case without observed covariates—a similar procedure was done in [2].
    }

    \item "Is there a simple explanation for assumption 4 (other than its direct implication of uniqueness)?"
    
    {\color{blue}
    Assumption 4 is a sufficient condition that guarantees that the moments of the distribution define it uniquely. Any alternative assumption that provides the same guarantee can be used in its place.
    }

    \item "Section 3 mentions ”mild non-Gaussianity assumptions”. Can the authors explain how non-Gaussianity relates to the assumptions of the theorems, and perhaps on an intuitive level, why Gaussianity is a problem for the proposed method? Linear-gaussian SCMs are fairly common, so it seems that this would be an important drawback to discuss further considering the setting."

    {\color{blue}
    Non-Gaussianity assumption is a common and important assumption in the literature of causal effect estimation and causal discovery [1, 2, 3, 4, 5, 6]. More specifically, for Gaussian distributions some causal relationships are indistinguishable from observational data. 
    %, while non-Gaussianity introduces distinctive features like skewness, kurtosis, etc. T
    This issue is well-illustrated by the non-identifiability results of [2, 6] for linear Gaussian models. Note that unlike many other work ([1, 3, 4, 5, 6]), we do not also require that \textit{all} the exogenous noises are non-Gaussian
    Instead, in 3 out of 4 cases, it suffices that only a specific exogenous noise term is non-Gaussian (for second and third cases the exogenous noise of $T$; for fourth case the exogenous noise of $U$).

    [1] Schkoda, D., Robeva, E., and Drton, M. Causal discovery of linear non-gaussian causal models with unobserved confounding.
    
    [2] Yaroslav Kivva, Saber Salehkaleybar, and Negar Kiyavash. A cross-moment approach for causal effect estimation. Advances in Neural Information Processing Systems, 36, 2024.

    [3] Cai, R., Huang, Z., Chen, W., Hao, Z., and Zhang, K. Causal discovery with latent confounders based on higher-order cumulants. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.

    [4] S. Salehkaleybar, A. Ghassami, N. Kiyavash, and K. Zhang. Learning linear non-gaussian causal models in the presence of latent variables. J. Mach. Learn. Res., 21:39–1, 2020.

    [5] Patrik O Hoyer, Shohei Shimizu, Antti J Kerminen, and Markus Palviainen. Estimation of causal effects using linear non-gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2):362–378, 2008.

    [6] Jan Eriksson and Visa Koivunen. Identifiability, separability, and uniqueness of linear ica models. IEEE Signal Processing Letters, 11(7):601–604, 2004.
    
    }

    \item "The implicit assumptions seems to be that the theoretical moments are simply replaced by sample moments. In addition, there is no discussion on how to quantify the uncertainty of the obtained ”estimates”."

    {\color{blue}
    The main contribution of our work is to establish identification results for the treatment effect using observations from multiple environments in the presence of latent confounding. We showed that, under mild assumptions, the treatment effect can be uniquely identified when only a single parameter varies across environments. Furthermore, we provided a closed-form expression for the treatment effect in terms of higher-order moments. In the estimation procedure, these theoretical moments were replaced by their empirical counterparts, and we evaluated the performance of our method through comprehensive experiments, showing that the estimates converge to the true value as the sample size increases.

While the question of uncertainty estimation is indeed important, it requires a separate theoretical analysis, which is beyond the scope of this work and represents a promising direction for future research.
    %The main contribution of our work is to provide identification results of the treatment effect given observations from two environments under the assumption of a linear SCM. We show under some mild assumptions that the treatment effect can be identified uniquely  whenever only one parameter varies across the environments, and show how to identify this parameter. Furthermore, for the identifiability result, we provided the closed form solution that gives an unbiased estimate of the treatment causal effect under an assumption of infinite data. This result is novel and to best of our knowledge there is no other work that handles it. Indeed, for the estimation procedure the theoretical moments are replaced by sample moments and we showed by experimental results that it converges to the true value. The question of uncertainty estimation is also very important; however, it requires a separate analysis and which is out-of-the-scope of this work and would be a good direction for future work.
    }

    \item "Minor comments"

    {\color{blue}
    Thank you for pointing out these typos. We will fix them in the final version and apply the changes according to your suggestions.
    }
\end{itemize}

\section{Reviewer 4}
\begin{itemize}
    \item \textbf{Q1 Summary And Contributions:}
    This paper investigates causal effect estimation in the presence of a latent confounder with data from two different environments. It first proves identifiability under the condition that only one parameter varies across environments and proposes corresponding method to estimate the causal effect. Second, it proposes a method to locate the source of change. Finally, it proves unidentifiability under the condition that two parameters vary across environments.
    \item \textbf{Q3 Main Strengths:}
    \begin{itemize}
        \item This paper investigates an interesting problem. Conventional methods for causal effect estimation typically require instrumental variables or proxy variables of the latent confounders. This paper reveals that with data from two environments, neither of them if required.
        \item This paper is technically solid. Under the condition that only one parameter changes across environments, the authors investigate all possible cases in details, they not only prove identifiability but also propose corresponding estimation methods. Besides, they even propose a method to locate the change source.
        \item This paper is well-organized and easy to follow.
    \end{itemize}
    \item \textbf{Q4 Main Weakness:}
    \begin{itemize}
        \item This paper relies heavily on the assumption that only one parameter changes across environments, I think this is a restrictive assumption in real world and it is hard to validate. The authors would be better to present the motivation of this assumption.
        \item This paper only proves unidentifiability for the case where $\{\epsilon_u, \epsilon_t\}$ 
 changes, I wounder whether it is unidentifiable when any two of the four parameters change.
    \end{itemize}
    \item \textbf{Q5 Detailed Comments To The Authors:}
    \begin{itemize}
        \item The authors mention some works leveraging optimization techniques to recover direct causes. I'm not familiar with this line of works, I will be grateful if the authors point out the superiority of their moment-based methods over these optimization-based methods. For instance, I guess the latter may get stuck in local optima easily.
        \item I think it is unnecessary to discuss IRM, which is not very related to causal effect estimation.
    \end{itemize}
    \item \textbf{Q7 Justification For Your Score:} Overall, I think this paper is technically solid, the strengths overweigh the weaknesses
\end{itemize}

\subsection{Answer}

\begin{itemize}
    \item "This paper relies heavily on the assumption that only one parameter changes across environments, I think this is a restrictive assumption in real world and it is hard to validate. The authors would be better to present the motivation of this assumption"

    {\color{blue}
    Thank you for the suggestion, we will add discussion on it to the final version of the paper. Specifically, we believe that in domains such as genetics and healthcare, it is often useful to model environmental variation through changes in a single, interpretable factor—such as the prevalence of a risk, treatment assignment, or exposure to a specific soft intervention, so that they may be represented via some change in the distribution or causal mechanism. Moreover, we demonstrated that the causal effect is not identifiable from two environments if the exogenous noises of both $T$ and $U$ vary across environments (Section 3.6). Consequently, when two parameters change, it may be impossible to identify the causal effect using only two environments.
    

    }

    \item "This paper only proves unidentifiability for the case where $\{\epsilon_u, \epsilon_t\}$,  I wounder whether it is unidentifiable when any two of the four parameters change."
    
    {\color{blue}
    This is an interesting question. While we believe that the treatment effect remains unidentifiable when any two parameters—other than $\{\epsilon_u, \epsilon_t\}$—change, we currently do not have a clear construction, analogous to the $\{\epsilon_u, \epsilon_t\}$ case, to formally prove this. We will include this question as a potential direction for future research in the revised version.
    %It is an interesting question. We believe that the treatment effect is not unidentifiable when any two of the other parameters change, however we do not a clear construction similar to the case with $\{\epsilon_u, \epsilon_t\}$ that proves it. We will add this question as future research direction in the revised version.
    }

    \item "The authors mention some works leveraging optimization techniques to recover direct causes. I'm not familiar with this line of works, I will be grateful if the authors point out the superiority of their moment-based methods over these optimization-based methods. For instance, I guess the latter may get stuck in local optima easily."
    {\color{blue}
     As the reviewer pointed out, most methods based on optimization techniques may get stuck in poor local minima, since the objective functions in this setting are generally non-convex. In addition, as mentioned in the related work section, some of these methods either do not allow for latent confounding between covariates and the target variable, or assume that all causal coefficients in the linear model remain unchanged across environments. 
    %As pointed by the reviewer, most methods based on optimization techniques might get stuck in bad local minima as the objective functions in this setup is generally non-convex. Besides this, as we mentioned in the related work section, some of these methods do not allow latent confounding between covariates
%and the target variable or that all causal coefficients in the linear model remain unchanged. 
    }

    \item "I think it is unnecessary to discuss IRM, which is not very related to causal effect estimation."

    {\color{blue}
    Thank you for the suggestion. In the revised version, we emphasize that IRM was introduced for designing robust predictive models in multi-environment settings, rather than for the problem of causal effect identification.
    %Thank you for the suggestion. In the revised version, we emphasize that IRM was introduced for designing robust predictive models in the multi-environemnt setting rather than for causal effect identification problem.
    }
\end{itemize}

\section{Reviewer 5}
\begin{itemize}
    \item \textbf{Q1 Summary And Contributions:}
    The paper considers the problem of estimating the causal effect of a treatment in the presence of latent confounders, when data are collected from heterogeneous environments where only certain aspects of the data-generating process vary. It shows that under a linear structural causal model, if only a single parameter—either a causal coefficient or an exogenous noise distribution—changes between environments, the treatment effect can be uniquely identified using novel moment-based algorithms. Conversely, when multiple parameters vary simultaneously, the causal effect is not identifiable.
    \item \textbf{Q3 Main Strengths:}
    One significant strength of the paper is its novel identification results that leverage higher-order moments. Specifically, the authors show that causal effects can be identified uniquely under certain conditions when data from multiple heterogeneous environments are available, and when only a single parameter or exogenous noise distribution varies across these environments.
    
    Further, the paper is easy to follow and well-written.
    \item \textbf{Q4 Main Weakness:}
    I believe the main weakness of the paper is the experimental evaluation. Specifically:
    \begin{itemize}
        \item The empirical analysis does not investigate the performance of the proposed method under realistic forms of model misspecification and/or assumption violations.
        \item The evaluation compares the proposed algorithms only to basic linear regression baselines. The absence of comparisons with other multi-environment causal inference algorithms limits our understanding.e.
        \item The empirical evaluation is limited to synthetic datasets.
        \item The evaluation does not discuss the computational complexity or runtime of the proposed methods.
    \end{itemize}
    \item \textbf{Q5 Detailed Comments To The Authors:} 
    Concerns on positioning and related work:
    \begin{itemize}
        \item The authors dedicate a significant amount of discussion to invariance-based methods that focus primarily on robust prediction, such as Invariant Risk Minimization (IRM), anchor regression, invariant causal prediction. Although these approaches are related conceptually, they fundamentally differ from the paper’s goal of identifying treatment effects.
        \item The related work section omits existing literature explicitly aimed at identifying causal treatment effects from multi-environment data. In particular, the authors should position their method against the works of [1,2,3,4]. Further, it would be good if the authors could expand their experimental evaluation to at least compare against [1] as a baseline.
        \item The setting of the authors—linear causal models with latent confounders—is closely related to the half-trek approach developed by [5]. This work address similar identification issues in linear structural equation models with latent variables and propose explicit graphical conditions (half-trek criteria) under which parameters become identifiable. Can the authors clarify the differences and similarities between their approach and the criteria proposed in [5]?
    \end{itemize}
    [1] Shi, Claudia, Victor Veitch, and David M. Blei. "Invariant representation learning for treatment effect estimation." In Uncertainty in artificial intelligence, pp. 1546-1555. PMLR, 2021.

    [2] De Bartolomeis, P., Kostin, J., Abad, J., Wang, Y., \& Yang, F. Doubly robust identification of treatment effects from multiple environments. In The Thirteenth International Conference on Learning Representations.
    
    [3] Shah, A., Shanmugam, K., \& Kocaoglu, M. (2023). Front-door adjustment beyond markov equivalence with limited graph knowledge. Advances in Neural Information Processing Systems, 36, 43800-43825.
    
    [4] Hartford, Jason S., et al. "Valid causal inference with (some) invalid instruments." International Conference on Machine Learning. PMLR, 2021.
    
    [5] Foygel, R., Draisma, J., \& Drton, M. (2012). Half-trek criterion for generic identifiability of linear structural equation models. The Annals of Statistics, 40(3), 1682–1713.

    \item \textbf{Q7 Justification For Your Score:} My overall assessment was mainly driven by the limited experimental evaluation and the lack of proper positioning in the literature. I would be happy to increase my score if the authors can adequately address these concerns.
\end{itemize}

\subsection{Answer}

\begin{itemize}
    \item "The evaluation does not discuss the computational complexity or runtime of the proposed methods."

    {\color{blue}
    We will add this discussion to the final version of the paper. More specifically, The computational complexity of Algorithms 1, 2, 3, and 4 
    %within each iteration of the while loop
    is linear with respect to the number of samples, as all moment estimation procedures can be performed with complexity $O(N)$, where $N$ is the number of samples and the number of steps done by the loops is independent quantity of $N$. Note, that the total number of iterations of the loops  depends on certain properties of the underlying distributions, which can be categorized into two types:
    %The Algorithms 1, 2, 3 and 4 are linear in terms of the number of samples, because all the moment estimation procedures can be done by $O(N)$ complexity ($N$ - number of samples). However, this in the Algorithms 1, 2, 3 and 4 present some while loops the length of which depend on the properties of case specific distributions. These properties fall into two category:
    \begin{itemize}
        \item  number of loop iteration $\sim$ the order of the first moment that indicates the difference in distributions that is supposed to vary across the environments according the conditions of the theorem (as in the Algorithm 1);
        \item number of loop iteration $\sim$ the smallest $n$ such that the inequality $E[\epsilon^n]\neq (n-1)E[\epsilon^{n-2}]E[\epsilon^2]$ holds; that indicates that distribution is non-Gaussian (as in Algorithm 3 or in procedure \textit{GetRatio}).  
    \end{itemize}
    It is worth noting that in both cases, it is natural to assume that the order of the computed moments are bounded by some small constant (typically less than 10). Therefore, in practice, the overall complexity of the algorithms remains $O(N)$.
    }
    \item "The evaluation compares the proposed algorithms only to basic linear regression baselines. The absence of comparisons with other multi-environment causal inference algorithms limits our understanding"

    {\color{blue}
    We would like to emphasize that the main contribution of our work is to establish identification results for the treatment effect using observations from multiple environments in the presence of latent confounding. To the best of our knowledge, there is no other work that can guarantee the treatment effect identification from multiple environments in the absence of proxy variables or other additional information. The experimental results on synthetic data are provided to merely support the theoretical results. Applying the estimation algorithm to real data or empirical analysis of the proposed method under the model misspecification and/or assumption violations could be a promising direction for future work.
    }

    \item The related work section omits existing literature explicitly aimed at identifying causal treatment effects from multi-environment data. In particular, the authors should position their method against the works of [1,2,3,4]. Further, it would be good if the authors could expand their experimental evaluation to at least compare against [1] as a baseline.

    {\color{blue}
    We thank the reviewer for the suggested related work, we will add the discussion on them to related work. 

    While the suggested papers explore interesting directions in treatment effect identification, we believe that a direct experimental comparison would not be possible due to substantial differences in problem settings. For instance, [1] assumes there are no unobserved confounders in the system, whereas our work specifically addresses the challenge posed by their presence. Moreover, [1] relies on the availability of observed covariates, instrumental variables, or other additional observed variables, while our main result operates under their absence.

Similarly, the work of [2] considers a fundamentally different set of assumptions for treatment effect identification across multiple environments. For instance, it is assumed that all the parents of treatment or outcome are observable together with some additional assumption on the observed distributions. Moreover, the identification process relies on the properties of adjustment sets. We also note that [2] was uploaded to arXiv on March 18, 2025, which is a month after the UAI conference submission deadline.

Finally, papers [3] and [4] study the problem under a single-environment setting, assuming access to certain specific additional information. In particular, [3] investigates sufficient conditions for identifying treatment effects using front-door-like adjustment sets when structural side information is available. Meanwhile, [4] addresses treatment effect identification in the presence of instrumental variables, under the assumption that some of them may be invalid.
    
    %While [1] is  certainly an interesting line of research of treatment effect identification from multiple environments, however we do not think that experimental comparison of these approaches will be fair due to the significant differences in the settings. [1] assumes that there are no unobserved confounders presented in the system, while in our work the existence of the unobserved confounder is the main challenge. Additionally, [1] does not require a constraint on linearity of SCM, but they require the presence of the observed covariates, instrumental, or other additional observed variables while in our main result we assume the absence of them. Analogously, the work of [2] also consider generically different conditions for the problem of treatment effect identification from multiple environments. Please also note, that [2] was uploaded to arxiv on 18 Mar 2025 which is a month later after the submission deadline for the UAI conference. Finally, the works [3, 4] consider the problem under assumption of a single environment under the observation of some specific additional quantities. Specifically, [3] studies the sufficient conditions under which for the treatment effect identification can be used the front-door-like adjustment set given structural side information. [4] studies the problem of treatment effect identification in the presence of instrumental variables under the assumption that some of them may be invalid.
    }

    \item "Can the authors clarify the differences and similarities between their approach and the criteria proposed in [5]?"

    {\color{blue}
    There are several fundamental differences between the problem and methods considered in our work and the ones considered in [5].
    
    Specifically, [5] considered the problem of generic identifiability of all edge coefficients in the graph $G$ representing causal relationships between the observed variables. The authors assumed single environment observations under assumptions that all the distributions are Gaussian while 
    %the unobserved variables 
    %are modeled through the covariance between exogenous noise terms. These 
    unobserved confounding relations are modeled via the bidirected edges in $G$.
    %All these crucial properties are opposite to the ones we consider in our work. Additionally, in [5] the exogenous noise terms in the structural equations can be correlated, and are represented by bidirected edges in the graph $G$. 
    Note that the identified edges are represented via an adjacency matrix up to a transformation via some rational function $\psi()$ that is as well different from the definition of identifiability considered in our work. Moreover, the graph considered in our work is HTC-nonidentifiable (see the exact definition in [5]), since there is no set satisfying half-trek with respect to $v:=Y$.
    
    Further, in the methods proposed in [5], the authors do not operate with moments of order higher than 2. This is due to the fact that the joint Gaussian distribution is uniquely defined through the covariance matrix and therefore it contains "all" information encoded in the observational data. 
    % Utilizing these properties of Gaussian distributions and acyclicity of the graph $G$, they derive the graphical criterion under which the graphical structure of $G$ can be recovered just from the observational data.
    }
\end{itemize}

\end{document}
