Keywords: differential privacy, data synthesis
TL;DR: This paper provides a modularized benchmark for differentially private tabular data synthesis algorithms.
Abstract: Differentially private (DP) tabular data synthesis algorithms generate artificial data that preserves the statistical properties of private data while safeguarding individual privacy. However, the emergence of diverse algorithms in recent years has introduced challenges in practical applications, such as inconsistent data processing methods, and the lack of in-depth algorithm comparisons and analysis. These factors create significant obstacles to selecting appropriate algorithms. In this paper, we address these challenges by proposing a novel benchmark for evaluating tabular data synthesis methods. We present a unified evaluation framework that integrates data preprocessing, feature selection, and data synthesis modules, facilitating fair and comprehensive comparisons. Our evaluation reveals that no single method consistently outperforms the rest across all scenarios. Furthermore, we conduct an in-depth experimental evaluation of each algorithmic module, offering insights into the strengths and limitations of different strategies. This lays the foundation for designing more robust and interpretable methods for private data synthesis. Source codes are available at the anonymous link\footnote{\url{https://anonymous.4open.science/r/tab_bench-DE92/}}.
Submission Number: 26
Loading