Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models

TMLR Paper3537 Authors

22 Oct 2024 (modified: 01 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Diffusion models have emerged as a robust framework for various generative tasks, including tabular data synthesis. However, current tabular diffusion models tend to inherit bias in the training dataset and generate biased synthetic data, which may influence discriminatory actions. In this research, we introduce a novel tabular diffusion model that incorporates sensitive guidance to generate fair synthetic data with balanced joint distributions of the target label and sensitive attributes, such as sex and race. The empirical results demonstrate that our method effectively mitigates bias in training data while maintaining the quality of the generated samples. Furthermore, we provide evidence that our approach outperforms existing methods for synthesizing tabular data on fairness metrics such as demographic parity ratio and equalized odds ratio, achieving improvements of over $10\%$.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=rDg5YLqeyw&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: We forgot to add a co-author in the last submission.
Assigned Action Editor: ~Dennis_Wei1
Submission Number: 3537
Loading