Measuring Privacy Risks and Tradeoffs in Financial Synthetic Data Generation

Published: 03 Feb 2026, Last Modified: 03 Feb 2026TIME 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Differential Privacy, Membership Inference Attack, Tabular Data, Privacy Audit, Synthetic Data Generation
TL;DR: We explore the privacy-utility tradeoff of synthetic data generation schemes on tabular financial data.
Abstract: Synthetic data generation has emerged as a practical mechanism for privacy-preserving data sharing across web platforms, e-commerce systems, and online analytics pipelines. However, the extent to which synthetic data truly protects user privacy, especially in domain-specific tabular datasets that mix categorical and numerical attributes, remains poorly understood. We explore the privacy-utility tradeoff of synthetic data generation schemes on tabular financial datasets, a domain characterized by high regulatory risk and severe class imbalance. We consider representative tabular data generators, including autoencoders, generative adversarial networks, diffusion, and copula synthesizers, and further, we provide novel privacy-preserving implementations of two of these. We evaluate whether and how well the generators simultaneously provide data quality, downstream utility, and privacy. Our results offer insight into the distinct challenges of generating synthetic data from datasets that exhibit severe class imbalance and mixed-type attributes.
Submission Number: 26
Loading