Differentiable Causal Discovery of Linear Non-Gaussian Acyclic Models Under Unmeasured Confounding

TMLR Paper4056 Authors

26 Jan 2025 (modified: 28 May 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We propose a score-based method that extends the framework of the linear non- Gaussian acyclic model (LiNGAM) to address the problem of causal structure estimation in the presence of unmeasured variables. Building on the method pro- posed by Bhattacharya et al. (2021), we develop a method called ABIC LiNGAM, which assumes that error terms follow a multivariate generalized normal distribu- tion and employs continuous optimization techniques to recover acyclic directed mixed graphs (ADMGs). We demonstrate that the proposed method can esti- mate causal structures, including the possibility of identifying their orientations, rather than only Markov equivalence classes, under the assumption that the data are linear and follow a multivariate generalized normal distribution. Additionally, we provide proofs of the identifiability of the parameters in ADMGs when the er- ror terms follow a multivariate generalized normal distribution. The effectiveness of the proposed method is validated through simulations and experiments using real-world data.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Based on the comments we received, we have made the following revisions. The differences from the initial submission are color-coded so that the changes are clear. Reviewer 1 ①“ABIC LiNGAM vs. BANG: tiny performance gap” Even a small empirical gap masks four practical advantages of ABIC LiNGAM: Single score optimisation: avoids multiple statistical tests and type-I/II error propagation. Easy injection of prior knowledge: domain constraints can be added as simple numerical penalties. Unified Gaussian / non-Gaussian treatment: one framework, Markov-equivalence for Gaussian, full orientation for non-Gaussian. High extensibility: non-linearities or mixed data types require only loss-function tweaks, not new tests. ②Background vs. novel contribution unclear Move the description of ABIC (Bhattacharya et al., 2021) to the Related Work section. Keep §4.2 focused exclusively on our new extensions (non-Gaussian errors, generalised-normal parameters). ③Behaviour under Gaussian errors With Gaussian errors the method recovers one member of the Markov equivalence class (directions unidentifiable), but the same optimisation routine remains valid, preserving a coherent workflow. ④Non-linear structural functions Extra experiments with polynomial terms show high skeleton F-score (0.901) but lower edge-direction accuracy; BANG sometimes outperforms here. Future work: non-linear loss terms and theoretical guarantees. ⑤Linear structure + heavy-tailed errors When the true shape parameter β is known (e.g. t-distribution, ν = 3) ABIC LiNGAM is best on all metrics. If β is estimated and mis-specified, direction accuracy drops—practitioners should grid-search β or use prior information. Reviewer 2 ①Literature coverage too narrow Added ADMG work (Richardson & Spirtes 2002; Tashiro et al. 2014) and recent score-based methods (Nowzohour 2017; Bernstein 2020; Chen 2021; Claassen 2022; Ng 2024). Contrasted our method with BANG and ABIC, stressing that non-Gaussianity enables full identification. ②Redundant text / missing definitions Removed duplicated Brito & Pearl (2002) sentences; created a glossary subsection for “bow-free ADMG”, “Markov equivalence”, “non-Gaussianity”. ③Algorithm 2 resembles prior work ABIC details now live in Related Work; Algorithm 2 explicitly highlights the new β≠1 term (handles non-Gaussian cases). ④⑤Notation issues Corrected  Z_{-i} = Ω_{-i, -i}^{-1}𝜀_{-1} defined tr(A) right after Eq. (22). ⑥Algorithm formatting Split into stages (A) pseudo-variables, (B) parameter optimisation, (C) convergence; indented and annotated differences between Algorithms 1 & 2. ⑦Lack of structure / emphasis Inserted Key Point and (Intuition) callouts at critical junctures; bolded main take-aways and used bullet lists. ⑧Large-sample anomalies (Fig. 3) Explained that bigger samples increase power and over-detect edges when the penalty is fixed; appendix now shows 2k & 5k sample results and how stricter penalties restore precision. ⑨Need a consistency theorem Stated plan to prove skeleton consistency for bow-free ADMGs with non-Gaussian errors in future work; current manuscript provides empirical evidence only. Reviewer 3 Added two explicit statements (end of Introduction and end of Conclusion) underscoring that introducing non-Gaussian errors is a deliberate strategy, not a mere relaxation: higher-order moments unlock causal directions even with latent confounding. Overall: (i) foreground the methodological advantages and extensibility of ABIC LiNGAM (ii) restructure the manuscript to cleanly separate prior work from contributions (iii) fix notation/formatting (iv) expand experiments and practical guidance (v) emphasise why non-Gaussianity is pivotal for full identifiability.
Assigned Action Editor: ~Sergey_Plis1
Submission Number: 4056
Loading