Diffusion Models for Tabular Data Imputation and Synthetic Data Generation

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Data imputation, synthetic data generation, Diffusion Model, Generative Model, Transformer
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We present a new model to tackle two tasks at one, table data imputation and the generation of new synthetic samples
Abstract: Data imputation and data generation are crucial tasks in various domains, ranging from healthcare to finance, where incomplete or missing data can hinder accurate analysis and decision-making. In this paper, we explore the use of diffusion models with transformer conditioning for both data imputation and data generation tasks. Diffusion models have recently emerged as powerful generative models capable of capturing complex data distributions. By incorporating transformer conditioning, we harness the ability of transformers to model dependencies and long-range interactions within tabular data. We conduct a comprehensive evaluation by comparing the performance of diffusion models with transformer conditioning against state of the art techniques such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) on benchmark datasets. For data imputation, we assess the models' ability to accurately estimate missing values while preserving the underlying data distribution. In terms of data generation, we evaluate the quality and diversity of synthetic data samples produced by the diffusion models.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5608
Loading