FDA: Generating Fair Synthetic Data with Provable Trade-off between Fairness and Faithfulness

24 Sept 2024 (modified: 17 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Fairness, Fair synthetic data generation, Joint modeling, Faithfulness, Trade-off between fairness and faithfulness
TL;DR: We propose a novel framework called FDA for generating Fair synthetic data through Data Augmentation, offering the first method with provable trade-off guarantee between fairness and faithfulness.
Abstract: We propose a novel framework called FDA for generating Fair synthetic data through Data Augmentation, offering the first method with provable trade-off guarantee between fairness and faithfulness. Unlike other existing methods, our approach utilizes a novel joint model that consists of two sub-models: one focused on enforcing strict fairness constraints while the other dedicated to preserving fidelity to the original data, coupled with a tuning mechanism that provides explicit control over the trade-off between fairness and faithfulness. Specifically, our FDA framework enables explicit quantification of the extent to which the generated fair synthetic data preserve faithfulness to the original data, while achieving an intermediate level of fairness determined by a user specified parameter $\alpha \in [0, 1]$. Theoretically, we show that the resulting fair synthetic data converge to the original data in probability when $\alpha$ tends to 1, thereby implying convergence in distribution. Our framework can be also combined with some GAN-based fair models, such as DECAF, to further improve the utility of the resulting synthetic data in downstream analysis, while carefully balancing fairness. Furthermore, we obtain an upper bound of the unfairness measurement for downstream models trained on the generated fair synthetic data, which can help users to choose appropriate $\alpha$. Finally, we perform numerical experiments on benchmark data to validate our theoretical contributions and to compare our FDA with other methods.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3909
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview