Continuous Bayesian Model Selection for Multivariate Causal Discovery

Anish Dhir; Ruby Sedgwick; Avinash Kori; Ben Glocker; Mark van der Wilk

Continuous Bayesian Model Selection for Multivariate Causal Discovery

Anish Dhir, Ruby Sedgwick, Avinash Kori, Ben Glocker, Mark van der Wilk

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Bayesian model selection for multivariate causal discovery allows for using more flexible models which leads to better performance.

Abstract: Current causal discovery approaches require restrictive model assumptions in the absence of interventional data to ensure structure identifiability. These assumptions often do not hold in real-world applications leading to a loss of guarantees and poor performance in practice. Recent work has shown that, in the bivariate case, Bayesian model selection can greatly improve performance by exchanging restrictive modelling for more flexible assumptions, at the cost of a small probability of making an error. Our work shows that this approach is useful in the important multivariate case as well. We propose a scalable algorithm leveraging a continuous relaxation of the discrete model selection problem. Specifically, we employ the Causal Gaussian Process Conditional Density Estimator (CGP-CDE) as a Bayesian non-parametric model, using its hyperparameters to construct an adjacency matrix. This matrix is then optimised using the marginal likelihood and an acyclicity regulariser, giving the maximum a posteriori causal graph. We demonstrate the competitiveness of our approach, showing it is advantageous to perform multivariate causal discovery without infeasible assumptions using Bayesian model selection.

Lay Summary: Scientists often want to understand cause-and-effect relationships from data, like determining whether smoking causes cancer or if education leads to higher income. Current methods for discovering these relationships from observational data require very strict assumptions that rarely hold true in real-world situations, leading to unreliable results. We developed a new approach that uses Bayesian statistics to be more flexible about these assumptions while still providing reliable answers. Our method uses advanced machine learning techniques to model the relationships between multiple variables simultaneously, then finds the most likely cause-and-effect structure. Our technique outperforms existing methods where their strict assumptions don't hold. This makes causal discovery more reliable and accessible for scientists studying complex systems with multiple interacting factors, from medical research to economics.

Link To Code: https://github.com/Anish144/ContinuousBMSStructureLearning

Primary Area: General Machine Learning->Causality

Keywords: bayesian model selection, gaussian process, causal discovery

Submission Number: 4001

Loading