On the causality-preservation capabilities of generative modelling

Yves-Cédric Bauwelinckx, Jan Dhaene, Milan van den Heuvel, Tim Verdonck

Published: 01 Jan 2025, Last Modified: 16 May 2025J. Comput. Appl. Math. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Modelling is essential in both the financial and insurance industries. The emergence of machine learning and deep learning models offers new tools for this, but they often require large datasets that are typically unavailable in business fields due to privacy and ethical concerns. This lack of data is currently one of the main hurdles in developing better models. Generative modelling, such as Generative Adversarial Networks (GANs), can address this issue by creating synthetic data that can be freely shared. While GANs are widely studied in fields like computer vision, their use in business is limited, primarily because business questions often focus on identifying causal effects, whereas GANs and neural networks typically emphasise high-dimensional correlations. This paper explores whether GANs can produce synthetic data that reliably answers causal questions by performing causal analyses on GAN-generated data under varying assumptions. The study includes cross-sectional, time series, and complete structural model scenarios. Findings show that while basic GANs replicate causal relationships in simple cross-sectional data, they struggle with more complex structural models. In contrast, CausalGAN effectively replicates the original causal model, and TimeGAN modifies the causal representation in time series data.