Demystifying amortized causal discovery with transformers

Francesco Montagna; Max Cairney-Leeming; Dhanya Sridhar; Francesco Locatello

Demystifying amortized causal discovery with transformers

Francesco Montagna, Max Cairney-Leeming, Dhanya Sridhar, Francesco Locatello

25 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: causal discovery, amortized inference, transformers, identifiability

TL;DR: Causal discovery with supervised learning seem to bypass the need for identifiability assumptions. We find that constraints on the training data act as a prior on the test data, defining the class of causal model identifiable by the learned algorithm

Abstract: Supervised learning for causal discovery from observational data often achieves competitive performance despite seemingly avoiding the explicit assumptions that traditional methods require for identifiability. In this work, we analyze CSIvA (Ke et al., 2023b) on bivariate causal models, a transformer architecture for amortized inference promising to train on synthetic data and transfer to real ones. First, we bridge the gap with identifiability theory, showing that the training distribution implicitly defines a prior on the causal model of the test observations: consistent with classical approaches, good performance is achieved when we have a good prior on the test data, and the underlying model is identifiable. Second, we find that CSIvA can not generalize to classes of causal models unseen during training: to overcome this limitation, we show that learning on datasets generated from different types of causal models, unambiguously identifiable in isolation, improves the test generalization. We analyze this empirical evidence with theory, illustrating that the ambiguous cases resulting from the mixture of identifiable causal models are unlikely to occur. Overall, we find that amortized causal discovery still adheres to identifiability theory, violating the previous hypothesis from Lopez-Paz et al. (2015) that supervised learning methods could overcome its restrictions.

Supplementary Material: zip

Primary Area: causal reasoning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4410

Loading