everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Data synthesis has been advocated as an important approach for utilizing data while protecting data privacy. In recent years, a plethora of tabular data synthesis algorithms (i.e., synthesizers) have been proposed. A comprehensive understanding of these synthesizers' strengths and weaknesses remains elusive due to the absence of principled evaluation metrics and head-to-head comparisons between state-of-the-art deep generative approaches and statistical methods. In this paper, we examine and critique existing evaluation metrics, and introduce a set of new metrics in terms of fidelity, privacy, and utility to address their limitations. Based on the proposed metrics, we also devise a unified objective for tuning, which can consistently improve the quality of synthetic data for all methods. We conducted extensive evaluations of 8 different types of synthesizers on 12 real-world datasets and identified some interesting findings, which offer new directions for privacy-preserving data synthesis.