Strong and Weak Identifiability of Optimization-based Causal Discovery in Non-linear Additive Noise Models

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Causal discovery aims to identify causal relationships from observational data. Recently, optimization-based causal discovery methods have attracted extensive attention in the literature due to their efficiency in handling high-dimensional problems. However, we observe that optimization-based methods often perform well on certain problems but struggle with others. This paper identifies a specific characteristic of causal structural equations that determines the difficulty of identification in causal discovery and, in turn, the performance of optimization-based methods. We conduct an in-depth study of the additive noise model (ANM) and propose to further divide identifiable problems into strongly and weakly identifiable types based on the difficulty of identification. We also provide a sufficient condition to distinguish the two categories. Inspired by these findings, this paper further proposes GENE, a generic method for addressing strongly and weakly identifiable problems in a unified way under the ANM assumption. GENE adopts an order-based search framework that incorporates conditional independence tests into order fitness evaluation, ensuring effectiveness on weakly identifiable problems. In addition, GENE restricts the dimensionality of the effect variables to ensure \emph{scale invariance}, a property crucial for practical applications. Experiments demonstrate that GENE is uniquely effective in addressing weakly identifiable problems while also remaining competitive with state-of-the-art causal discovery algorithms for strongly identifiable problems.
Lay Summary: Figuring out cause-and-effect relationships from data is crucial across many scientific fields, but current automated methods often struggle with complex non-linear systems, performing inconsistently. Our research identified that this is due to how "identifiable" the causal links are, leading us to distinguish between "strongly" and "weakly" identifiable problems. We developed GENE, a unified method that first determines a potential causal order of variables and then evaluates it by combining how well it fits the data with crucial statistical independence tests, ensuring robustness even when data scales vary. This significantly improves the discovery of correct causal links, particularly in challenging "weakly identifiable" scenarios where many existing methods fail, providing scientists with a more dependable tool to understand underlying mechanisms in complex systems.
Primary Area: General Machine Learning->Causality
Keywords: Optimization-based Causal Discovery, Identifiability, Additive Noise Model, Order Search, Implicit Function
Submission Number: 11188
Loading