Structured Evaluation of Synthetic Tabular Data

18 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Data synthesis, tabular data, evaluation, Bayesian nonparametrics
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We offer a coherent and comprehensive evaluation framework for synthetic tabular data with open-source implementation.
Abstract: Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns. Synthetic data generation offers potential solutions. Many metrics exist for evaluating the quality of synthetic tabular data; however, we lack an objective, coherent interpretation of the many metrics. To address this issue, we propose an evaluation framework with a single, mathematical objective that posits that the synthetic data are drawn from the same distribution as the observed data. Through various structural decomposition of the objective, the framework reorganizes and unifies existing metrics, including those that stem from fidelity considerations, downstream application, and model-based approaches. Moreover, the framework motivates new metrics and model-free baselines. We evaluate structurally informed synthesizers and synthesizers powered by deep learning. Using metrics derived from the new comprehensive and coherent framework, we show that synthetic data generators that explicitly represent tabular structure outperform other methods, especially on smaller datasets.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1429
Loading