[Re] DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Anonymous

[Re] DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Anonymous

05 Feb 2022 (modified: 05 May 2023)ML Reproducibility Challenge 2021 Fall Blind SubmissionReaders: Everyone

Keywords: GAN model, synthetic data

TL;DR: The paper by van Breugel et al. introduces DECAF a GAN-based model for generating fair synthetic data

Abstract: Scope of Reproducibility In the DECAF paper, the authors introduce a causal GAN-based model for generating fair synthetic data. Additionally, the paper describes a flexible causal approach for modifying this model such that it can generate fair data. Furthermore, van Breugel et al. guarantee that downstream models trained on the generated synthetic data can generate fair predictions on both synthetic and real data. We aim to reproduce these claims. Methodology We’ve started off using the original codebase provided by the authors of the DECAF paper and we’ve started trying to reproduce the results mentioned in the original paper. Our main focus was in trying to use the GAN-based DECAF model for generating fair synthetic data. This had to be done before we could get the other results that contained some effort of debiasing. We’ve tried this out on the adult dataset and the credit approval dataset both of which were mentioned by the authors of the original paper. Additionally, we have tried to reproduce the causal graph discovered of the credit approval dataset using a fast greedy equivalence search. Results The DECAF model proposed by the authors was trained on the adult dataset and the credit approval dataset. We used the original hyperparameters proposed by the paper. The architecture proposed by the paper included causal graphs - this was not included in the GitHub repository. The paper specified architectures to compare to DECAF, however, it did not include code which allowed us to do so. Instead, a GAN architecture was used to generate new, synthetic data, which allowed us to compare DECAF to another method. What was easy The authors describe in a clear manner how to implement the removal of edges in the Adult dataset ́s DAG in order to satisfy the fairness conditions. It was therefore easy to implement this for each of the fairness definitions in the DECAF paper. Furthermore, the paper’s instruction on how to perform the evaluation are also clear and easy to implement. What was difficult Most of the difficulty lies in trying to get the GAN based framework to get proper results. This also had to do with the fact that we did not know that we had to provide the model with a DAG seed and a bias dict. So efforts were made into manually encoding the connections of the graph based on the graph provided in the paper. Communication with original authors No contact was sought with the authors of the paper. Contact between peers working on the same project has been most fruitful in the sense that we we’re able to get knowledge into how they dealt with the issues that we were both facing.

Paper Url: https://arxiv.org/abs/2110.12884

Paper Venue: NeurIPS 2021

4 Replies

Loading