Reproduction study - DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Anonymous

Reproduction study - DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Anonymous

05 Feb 2022 (modified: 05 May 2023)ML Reproducibility Challenge 2021 Fall Blind SubmissionReaders: Everyone

Abstract: Reproducibility Summary Scope of Reproducibility We conduct a reproduction study on the paper "DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks". We investigate the following claims: (1) DECAF is versatile and can remove undesired bias using several definitions of fairness including Fairness Through Unawareness (FTU), Demographic Parity (DP) and Conditional Fairness (CF) measured by FTU and DP metrics (2)DECAF is able to generate fair synthetic data with better quality compared to other existing methods as measured by precision, recall and AUROC of a downstream classifier. (3) DECAF relies on the provision of causal structure in the form of a Directed Acyclic Graph (DAG) and DECAF is robust to incorrect DAG specification and any DAG from the Markov equivalence class found through causal discovery can be used for data generation. Methodology We repeated the main experiments of the authors using two data sets and added some experiments. The provided author's code was mostly used as a base, but we found various discrepancies between the paper and the code. Also, the provided code was not able to generate results yet and we had to alter it to do that. Therefore, our codebase consists of a combination of new and old code. Results The overall results from our study show that the DECAF method proposed by the authors removes undesired bias as claimed by the authors. We could reproduce the broad trends as reported by the authors for the Adult data set, but not for the Credit data set. Our results clearly show that DECAF could generate fair synthetic data while still maintaining high downstream utility for the first data set. What was easy The paper is intuitively written and includes clear graphs that supplement the explanation of the method. The paper also includes extensive appendices. The data sets that were used for the experiments were also provided, and easy to find. What was difficult The main difficulty was working with the code provided by the authors, which was an incomplete implementation of the DECAF pipeline and did not work yet. There were also various discrepancies between the paper and the code, e.g. additional terms in the training loss and the implementation of a XGBoost classifier instead of a Multilayer Perceptron. We put in much effort to debug and complement the code, which required a significant amount of work and took more time than expected. Communication with original authors The original authors answered some of our questions regarding the insufficient code and confirmed the code base is still under development. They kindly gave us additional code, but unfortunately this did not solve our problems.

Paper Url: https://proceedings.neurips.cc/paper/2021/file/ba9fab001f67381e56e410575874d967-Paper.pdf

Paper Venue: NeurIPS 2021

4 Replies

Loading