Keywords: Bayesian nonparametric, Dirichlet process, Differential privacy, Tabular data generation
TL;DR: Incorporating privacy as well as fairness through Bayesian nonparametric learning
Abstract: A fundamental challenge in data synthesis is protecting the fairness and privacy of
the individual, particularly in data-scarce environments where underrepresented
groups are at risk of further marginalization by reproducing the biases inherent in
the data modeling process. We introduce a privacy- and fairness-aware for a class
of generative models, which fuses the conditional generator within the framework
of Bayesian nonparametric learning (BNPL). This conditional structure imposes
fairness constraints in our generative model by minimizing the mutual information
between generated outcomes and protected attributes. Unlike existing methods
that primarily focus on sensitive binary-valued attributes, our framework extends
seamlessly to non-binary attributes. Moreover, our method provides a systematic
solution to class imbalance, ensuring adequate representation of underrepresented
protected groups. Our proposed approach offers a scalable, privacy-preserving
framework for ethical and equitable data generation, which we demonstrate by
theoretical guarantees and extensive experiments on sensitive empirical examples.
Primary Area: generative models
Submission Number: 16600
Loading