Exploring Data Distillation for efficient generation of Tabular Data

27 Sept 2024 (modified: 11 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: data distillation, tabular data
Abstract: Tabular data generation methods have emerged to address growing concerns about the use of sensitive tabular data for training machine learning models. Many methods focus on creating high-quality tabular data that can be used in place of the original dataset while retaining generalization performance on downstream tasks and protecting sensitive data in an era where privacy is paramount. Despite their avid success, many of the methods face implacable challenges and obstacles to wide-scale applications primarily due to the significant computational costs associated with data synthesis. In this paper, we propose a flexible data distillation pipeline as an alternative to conventional synthetic data generators that obtain competitive privacy metrics while achieving significantly higher downstream performance at a fraction of the compute costs. In particular, our method has accelerated data synthesis by $5\times$ on average when compared to synthetic generators while also achieving superior performance.
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9226
Loading