Utility Meets Privacy: A Critical Evaluation of Tabular Data Synthesizers

Published: 01 Jan 2025, Last Modified: 17 Jul 2025IEEE Access 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Evaluating synthetic data requires careful consideration of both utility and privacy. This study analyzes 12 synthesizers across 17 tabular health datasets, providing large-scale, comparable evaluation results. A novel utility-privacy score integrates a privacy measure into the evaluation, quantifying the trade-off between the two. Membership inference analysis is extended for robust privacy assessment, with reusable code provided for further research. Key findings include: 1) despite its simplicity, SMOTE achieves the best results of all synthesizers, but no single synthesizer consistently outperforms all others across all datasets; 2) utility and privacy are inherently correlated, with improvements in one compromising the other; 3) some synthesizers exhibit greater robustness to variations in data properties, such as sample and feature size, and some properties have a stronger impact than others. These findings underscore how closely the utility of a synthesizer is tied to individual datasets and privacy considerations, and highlight the importance of incorporating these aspects into future research and adopting broader, more diverse evaluation frameworks.
Loading