Causal Pathway-Integrated Generative Adversarial Networks for Counterfactually Fair Data Generation

Published: 2025, Last Modified: 28 Jan 2026ICIC (10) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Existing methods for generating fair synthetic data using Generative Adversarial Networks (GANs) reduce the discriminability of training data. However, these methods often lead to incomplete causal models due to the randomization of unobservable variables. In particular, randomization fails to consider all relevant factors influencing the process of data generation, which in turn results in residual biases and inaccurate fairness assessments. To address this issue, we propose CPIGAN - a counterfactual fairness synthetic data generation framework based on GANs. CPIGAN first employs a Variational Autoencoder (VAE)-based inference model to capture unobservable variables that may influence decision fairness. It then leverages a Generative Adversarial Network (GAN), grounded in the causal graph and inferred unobservable variables, to conditionally reconstruct each variable according to its causal parents. In doing so, CPIGAN facilitates the generation of counterfactually fair data and effectively mitigates biases caused by unobservable factors. Empirical evaluations on real-world datasets demonstrate that CPIGAN significantly reduces biases associated with sensitive attributes. In particular, compared to existing generative networks, CPIGAN shows an average improvement of 72.36% and 43.67% in terms of Maximum Mean Discrepancy (MMD) and Wasserstein distance (Wass), respectively.
Loading